This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability Hackathon
Accepted at the 
 research sprint on 

Caught Red-Bandit

A multi-armed bandit problem is a problem where there are multiple actions you can take and each action gives some reward, creating questions around whether to explore or exploit. We created a multi-armed bandit gym environment to help people understand how a transformer makes strategic decisions in such a scenario. While previous transformer interpretability research has focused on how models understand language, our solution is novel in the sense that it looks into the model’s strategic decision making process.

By 
Víctor Levoso, Alejandro González, Richard Annilo & Theresa Thoraldson
🏆 
4th place
3rd place
2nd place
1st place
 by peer review