Jul 28, 2025
Layerwise Development of Compositional Functional Representations Across Architectures
Jeyashree Krishnan, Ajay Mandyam Rangarajan
Understanding where and how neural networks represent internal computations is central to the goals of mechanistic interpretability and AI safety. We present a comprehensive empirical framework for studying circuit emergence, the process by which neural networks progressively encode and modularize functionally meaningful information across their depth and architecture. Using synthetic datasets and a hierarchy of function complexities, we investigate both MLPs and Transformers across seven axes of interpretability: complexity scaling, modular composition, symmetry, phase transitions, layer-wise decoding, and grokking. Linear probes trained on hidden activations reveal the internal structure of concept learning, including decomposed representations for composite functions, invariance to symmetric inputs, and early indicators of generalization. We observe consistent evidence of phase transitions in concept decodability, modular emergence of inner subfunctions in deeper layers, and divergence in how MLPs and Transformers encode complexity. Position embeddings in Transformers demonstrate gradual emergence, whereas other layers saturate instantly. However, our experiments reveal that while MLPs often exhibit modular, interpretable representations of composite functions, Transformers fail to decompose composite functions into their constituent parts, highlighting architectural limits to spontaneous circuit emergence.
No reviews are available yet
Cite this work
@misc {
title={
@misc {
},
author={
Jeyashree Krishnan, Ajay Mandyam Rangarajan
},
date={
7/28/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}