This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Computational Mechanics Hackathon!
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
June 3, 2024
Accepted at the 
661eead4df76057e22a47ca8
 research sprint on 

Looking forward to posterity: what past information is transferred to the future?

I used mechanistic interpretability techniques to try to see what information the provided Random Randox XOR transformer looks at when making predictions by examining its attention heads manually. I find that earlier layers pay more attention to the previous two tokens, which would be necessary for computing XOR, than the later layers. This finding seems to contradict the finding that more complex computation typically occurs in later layers.

By 
Zmavli Caimle
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private