This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability Hackathon
Accepted at the 
 research sprint on 

Mechanisms of Causal Reasoning

Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.

By 
Ben Sturgeon, Jacy Reese Anthis, Mark Chimes, Sky Cope
🏆 
4th place
3rd place
2nd place
1st place
 by peer review