Nov 25, 2024
Let LLM Agents Perform LLM Surgery
Sharat Jacob Jacob
This project aimed to create and utilize LLM agents that could perform various mechanistic interventions on other LLMs. A few experiments were conducted ranging from an agent unsteering a mechanistically steered model to a neutral state, to an agent performing mechanistic edits to create a custom LLM as per user requirement. Goodfire's API was utilized along with their pre-defined functions to create the actions the agents would utilize.
Jaime Raldua
This is a very interesting direction. There seems to be some existing work around meta mech.interp so some more lit.review would help understand how this project fits into the current landscape. The first results seem promising
Jason Schreiber
This is a creative research idea that deserves more exploration. I would love to see some quantification of the results here!
Tom McGrath
This is a cool idea - getting an agent to edit the internals of another agent to perform the task. As the authors observe, this is pretty difficult (even for frontier models) but as far as I can tell they had some successes. Quantifying these successes/failures can be difficult, especially at a relatively small scale and with time constraints - I'd encourage the authors to report a few transcripts of success & failure so we can get a feel for how well this performs and where steering agents succeeds and fails.
Cite this work
@misc {
title={
@misc {
},
author={
Sharat Jacob Jacob
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}