Understanding Addition in Transformers

This paper analyzes a single-layer Transformer trained on n-digit addition. It reveals parallel digit-wise processing, position-specific algorithms, and explains high-loss scenarios. The findings, validated through testing and modeling, advance model interpretability.

An interview with

"
Understanding addition in transformers
" was written by

Author contribution

No items found.

Citation

Send feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Media kit

Quotes

No items found.

All figures

No items found.