This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
ARENA 4.0 Interpretability Hackathon
66e1cdf44e17beca5dc0c050
ARENA 4.0 Interpretability Hackathon
September 15, 2024
Accepted at the 
66e1cdf44e17beca5dc0c050
 research sprint on 

Finding Circular Features in Gemma 2 2B

1. We began with the ARENA exercise which found circular features in GPT2-Small to understand the methodology by which these features were found. 2. We manually found anchor features by giving a prompt to Neuronpedia containing the days of the week. 3. We then applied a clustering algorithm, that is used to find the relevant features. 4. We then applied this to Gemma 2 5. We used SAEs from GemmaScope and found somewhat circular features for the days of the week. 6. We then created a custom dataset, specifically based on the days of the week.

By 
Leo, Misha
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private