Nov 25, 2024
Assessing Language Model Cybersecurity Capabilities with Feature Steering
Stefan Jones
Summary
Searched for the most highly activated weights on cybersecurity questions. Then adjusted these weights to see if the impact multiple choice question answering performance.
Cite this work:
@misc {
title={
Assessing Language Model Cybersecurity Capabilities with Feature Steering
},
author={
Stefan Jones
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}