Understanding Addition in Transformers

This paper analyzes a single-layer Transformer trained on n-digit addition. It reveals parallel digit-wise processing, position-specific algorithms, and explains high-loss scenarios. The findings, validated through testing and modeling, advance model interpretability.

No items found.