Mar 10, 2025
Searching for Universality and Equivariance in LLMs using Sparse Autoencoder Found Features
Meruyert Alimaganbetova, Jason Zeng
Summary
The project investigates how neuron features with properties of universality and equivariance affect the controllability and safety of large language models, finding that behaviors supported by redundant features are more resistant to manipulation than those governed by singular features.
Cite this work:
@misc {
title={
Searching for Universality and Equivariance in LLMs using Sparse Autoencoder Found Features
},
author={
Meruyert Alimaganbetova, Jason Zeng
},
date={
3/10/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}