Summary
This project aimed to create and utilize LLM agents that could perform various mechanistic interventions on other LLMs. A few experiments were conducted ranging from an agent unsteering a mechanistically steered model to a neutral state, to an agent performing mechanistic edits to create a custom LLM as per user requirement. Goodfire's API was utilized along with their pre-defined functions to create the actions the agents would utilize.
Cite this work:
@misc {
title={
Let LLM Agents Perform LLM Surgery
},
author={
Sharat Jacob Jacob
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}