Jul 28, 2025
A Geometric Analysis of Transformer Representations via Optimal Transport
Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar
Understanding the internal workings of transformers is a major challenge in deep learning; while these models achieve state-of-the-art performance, their multi-layer architectures operate as "black boxes," obscuring the principles that guide their information processing pathways. To address this, we used Optimal Transport (OT) to analyze the geometric transformations of representations between layers in both trained and untrained transformer models. By treating layer activations as empirical distributions, we computed layer-to-layer OT distances to quantify the extent of geometric rearrangement, complementing this with representation entropy measurements to track information content. Our results show that trained models exhibit a structured, three-phase information processing strategy (encode-refine-decode), characterized by an information bottleneck. This is evident from a U-shaped OT distance profile, where high initial costs give way to a low-cost "refinement" phase before a final, high-cost projection to the output layer. This structure is entirely absent in untrained models, which instead show chaotic, uniformly high-cost transformations. We conclude that OT provides a powerful tool for revealing the learned, efficient information pathways in neural networks, demonstrating that learning is not merely about fitting data, but about creating an organized, information-theoretically efficient pipeline to process representations.
No reviews are available yet
Cite this work
@misc {
title={
@misc {
},
author={
Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar
},
date={
7/28/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}