This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Computational Mechanics Hackathon!
661eead4df76057e22a47ca8
Computational Mechanics Hackathon!
June 3, 2024
Accepted at the 
661eead4df76057e22a47ca8
 research sprint on 

Unsupervised Recovery of Hidden Markov Models from Transformers with Evolutionary Algorithms

Prior work finds that transformer neural networks trained to mimic the output of a Hidden Markov Model (HMM) embed the optimal Bayesian beliefs for the HMM's current state in their residual stream, which can be recovered via linear regression. In this work, we aim to address the problem of extracting information about the underlying HMM using the residual stream, without needing to know the MSP already. To do so, we use the R^2 of the linear regression as a reward signal for evolutionary algorithms, which are deployed to search for the parameters that generated the source HMM. We find that for toy scenarios where the HMM is generated by a small set of latent variables, the $R^2$ reward signal is remarkably smooth and the evolutionary algorithms succeed in approximately recovering the original HMM. We believe this work constitutes a promising first step towards the ultimate goal of extracting information about the underlying predictive and generative structure of sequences, by analyzing transformers in the wild.

By 
Dylan Bowman, Colin Lu
🏆 
4th place
3rd place
2nd place
1st place
 by peer review