Jul 28, 2025

A Geometric Analysis of Transformer Representations via Optimal Transport

Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar

Understanding the internal workings of transformers is a major challenge in deep learning; while these models achieve state-of-the-art performance, their multi-layer architectures operate as "black boxes," obscuring the principles that guide their information processing pathways. To address this, we used Optimal Transport (OT) to analyze the geometric transformations of representations between layers in both trained and untrained transformer models. By treating layer activations as empirical distributions, we computed layer-to-layer OT distances to quantify the extent of geometric rearrangement, complementing this with representation entropy measurements to track information content. Our results show that trained models exhibit a structured, three-phase information processing strategy (encode-refine-decode), characterized by an information bottleneck. This is evident from a U-shaped OT distance profile, where high initial costs give way to a low-cost "refinement" phase before a final, high-cost projection to the output layer. This structure is entirely absent in untrained models, which instead show chaotic, uniformly high-cost transformations. We conclude that OT provides a powerful tool for revealing the learned, efficient information pathways in neural networks, demonstrating that learning is not merely about fitting data, but about creating an organized, information-theoretically efficient pipeline to process representations.

Download

Review Project

See Code

View Related Sprint

Reviewer's Comments

Ari Brill

This project uses methods based on optimal transport to analyze the layer-wise evolution of representations in transformers. The project is interesting and well-executed, and the main ideas are explained well. The project could be improved by more explicitly articulating the connection to AI safety.

Esben Kran

Love this project and the simple conclusion of an encoding and a refinement phase. I would imagine this is a generalizable statement across NNs (worth a test) and it's interesting that the minimization of work isn't a *necessary* condition for information processing but that the refinement phase is an iterative improvement on the information as it passes through the network to reduce how much noise is output in latter layers. It generally looks to be a linear-ish increase in work between layers as it improves the compressed representation of the concepts it runs over, though it would be very interesting to see whether we can use this to study suddenly-changed models and see a non-linear change in work over the trained networks, especially over their checkpoints (models saved at intermediate intervals of the training process). Great work, very simple, good formalization, and a good introduction. Extensions on this work will show whether it'll be useful for AI safety but it has some interesting implications for how models work to create their output in general.

Jesse Hoogland

Cool! Interesting and sensible idea (I enjoyed reading Tishby's information bottleneck treatments of NNs back in the day, but am not aware of anyone trying to use OT to study information flow within a model). Solid execution for a quick hackathon project. Also very clearly written, I appreciated how easy it was to read.

My main concern is just that it's underpowered, and the results are still very sparse.. I think it'd be very valuable to look at how this changes over the course of training and to look at some smaller models where we have some ground-truth.

Cite this work

@misc {

title={

(HckPrj) A Geometric Analysis of Transformer Representations via Optimal Transport

author={

Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar

date={

7/28/25

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Momentum–Point-Perplexity Mechanics in Large Language Models

This work analyzes the hidden states of twenty different open-source transformer language models, ranging from small to medium size and covering five major architectures. The key discovery is that these models show signs of "energy conservation" during inference—meaning a certain measure combining changes in hidden states and token unpredictability stays almost constant as the model processes text.

The authors developed a new framework inspired by physics to jointly analyze how hidden states and prediction confidence evolve over time. They propose that transformers' behavior can be understood as following certain mechanical principles, much like how physical systems follow rules like conservation of energy.

Their experiments show that this conserved quantity varies very little between tokens, especially in untrained (random-weight) models, where it's extremely stable. In pre-trained models, the average energy drops more due to training, but there are larger relative fluctuations from token to token.

They also introduce a new method, based on this framework, for controlling transformer outputs by "steering" the hidden states. This method achieves good results—producing completions rated as higher in semantic quality, while still maintaining the same kind of energy stability.

Overall, the findings suggest that viewing transformer models through the lens of physical mechanics gives new, principled ways to interpret and control their behavior. It also highlights a key difference: random models behave more like balanced systems, while trained models make quicker, more decisive state changes at the cost of less precise energy conservation.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.

Careers

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com