Jul 28, 2025

A Geometric Analysis of Transformer Representations via Optimal Transport

Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar

Understanding the internal workings of transformers is a major challenge in deep learning; while these models achieve state-of-the-art performance, their multi-layer architectures operate as "black boxes," obscuring the principles that guide their information processing pathways. To address this, we used Optimal Transport (OT) to analyze the geometric transformations of representations between layers in both trained and untrained transformer models. By treating layer activations as empirical distributions, we computed layer-to-layer OT distances to quantify the extent of geometric rearrangement, complementing this with representation entropy measurements to track information content. Our results show that trained models exhibit a structured, three-phase information processing strategy (encode-refine-decode), characterized by an information bottleneck. This is evident from a U-shaped OT distance profile, where high initial costs give way to a low-cost "refinement" phase before a final, high-cost projection to the output layer. This structure is entirely absent in untrained models, which instead show chaotic, uniformly high-cost transformations. We conclude that OT provides a powerful tool for revealing the learned, efficient information pathways in neural networks, demonstrating that learning is not merely about fitting data, but about creating an organized, information-theoretically efficient pipeline to process representations.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

This project uses methods based on optimal transport to analyze the layer-wise evolution of representations in transformers. The project is interesting and well-executed, and the main ideas are explained well. The project could be improved by more explicitly articulating the connection to AI safety.

Love this project and the simple conclusion of an encoding and a refinement phase. I would imagine this is a generalizable statement across NNs (worth a test) and it's interesting that the minimization of work isn't a *necessary* condition for information processing but that the refinement phase is an iterative improvement on the information as it passes through the network to reduce how much noise is output in latter layers. It generally looks to be a linear-ish increase in work between layers as it improves the compressed representation of the concepts it runs over, though it would be very interesting to see whether we can use this to study suddenly-changed models and see a non-linear change in work over the trained networks, especially over their checkpoints (models saved at intermediate intervals of the training process). Great work, very simple, good formalization, and a good introduction. Extensions on this work will show whether it'll be useful for AI safety but it has some interesting implications for how models work to create their output in general.

Cool! Interesting and sensible idea (I enjoyed reading Tishby's information bottleneck treatments of NNs back in the day, but am not aware of anyone trying to use OT to study information flow within a model). Solid execution for a quick hackathon project. Also very clearly written, I appreciated how easy it was to read.

My main concern is just that it's underpowered, and the results are still very sparse.. I think it'd be very valuable to look at how this changes over the course of training and to look at some smaller models where we have some ground-truth.

Cite this work

@misc {

title={

(HckPrj) A Geometric Analysis of Transformer Representations via Optimal Transport

},

author={

Yadnyesh Chakane, Sunishka Sharma, Vishnu Vardhan Lanka, Janhavi Khindkar

},

date={

7/28/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

View All

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.