Jun 3, 2024
Looking forward to posterity: what past information is transferred to the future?
Zmavli Caimle
I used mechanistic interpretability techniques to try to see what information the provided Random Randox XOR transformer looks at when making predictions by examining its attention heads manually. I find that earlier layers pay more attention to the previous two tokens, which would be necessary for computing XOR, than the later layers. This finding seems to contradict the finding that more complex computation typically occurs in later layers.
No reviews are available yet
Cite this work
@misc {
title={
Looking forward to posterity: what past information is transferred to the future?
},
author={
Zmavli Caimle
},
date={
6/3/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


