Jul 27, 2025
Toy model of superposition control
Bartosz Rzepkowski
One of the most important open questions posed by the authors of the famous ”Toy models of superposition” paper is ”how can we control whether superposition and polysemanticity occur?” In this research proposal, we present a novel approach, which we hope will be a step forward in answering this question. We will operate under the assumption that each feature corresponds to a single neuron within a neural network layer. This restriction was selected because it is the most desirable one from the perspective of interpretability of neural networks. Under this framework, we will demonstrate how to ”handpick” desired features which
we want to use in the next layer, and how to impose precise restrictions on how those features should be superposed in the hidden space. This also implies that it is possible to fully prevent superposition if desired.
We hope that the techniques shown in this proposal will allow developers to design blueprints clearly defining which features should be kept at any network
depth, how those features should be superposed (if at all), and eventually dynamically adjust this specification. This will effectively allow developers to decide a priori, which circuits are allowed to form in the network, rather than discovering them after training is finished. The presented supervised learning algorithm will then force the neural network to follow this blueprint. We present how the whole procedure might work on a simple multilayer feedforward toy model.
The proposed method incorporates tensor network formalism along with Riemannian optimization techniques, both concepts widely used in the field of computational physics.
No reviews are available yet
Cite this work
@misc {
title={
@misc {
},
author={
Bartosz Rzepkowski
},
date={
7/27/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}