Nov 25, 2024
Assessing Language Model Cybersecurity Capabilities with Feature Steering
Stefan Jones
Summary
assessing-language-model-cybersecurity-capabilities-with-feature-steering
Searched for the most highly activated weights on cybersecurity questions. Then adjusted these weights to see if the impact multiple choice question answering performance.
Cite this work:
@misc {
title={
Assessing Language Model Cybersecurity Capabilities with Feature Steering
},
author={
Stefan Jones
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}