Jun 3, 2024
Investigating the Effect of Model Capacity Constraints on Belief State Representations
Ari Brill, Chu Chen
Summary
Computational mechanics provides a formal framework for understanding the concepts needed to perform optimal prediction. Abstraction and generalization seem core to the function of intelligent systems, but are not yet well understood. Computational mechanics may present a promising approach to studying these capabilities. As a preliminary exploration, we examine the effect of weight decay on the fractal structure of belief state representations in a transformer’s residual stream. We find that models trained with increasing weight decay coefficients learn increasingly coarse-grained belief state representations.
Cite this work:
@misc {
title={
Investigating the Effect of Model Capacity Constraints on Belief State Representations
},
author={
Ari Brill, Chu Chen
},
date={
6/3/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}