Nov 25, 2024
Sparse Autoencoders and Gemma 2-2B: Pioneering Demographic-Sensitive Language Modeling for Opinion QA
Qianmian Guo
This project investigates the integration of Sparse Autoencoders (SAEs) with the gemma 2-2b lan- guage model to address challenges in opinion-based question answering (QA). Existing language models often produce answers reflecting narrow viewpoints, aligning disproportionately with specific demographics. By leveraging the Opinion QA dataset and introducing group-specific adjustments in the SAE’s latent space, this study aims to steer model outputs toward more diverse perspectives. The proposed framework minimizes reconstruction, sparsity, and KL divergence losses while maintaining interpretability and computational efficiency. Results demonstrate the feasibility of this approach for demographic-sensitive language modeling.
Mateusz Dziemian
Seems to be in a similar direction to recent anthropic work focused on using SAE to improve bias/ fairness etc. Worthwhile checking that work before doing next steps.
Simon Lermen
On a quick look, this paper is a little bit strange. It introduces very complex notations and ideas on a seemingly simple idea. It seems that they wanted to make a model more representative of diverse viewpoints, but I don't find anything of this in the paper itself. The paper contains out-of-context sentences, and it's unclear what they actually did. There are also seemingly screenshots of other papers.
Take this sentence for example:
Highly informative representations are produced by the language models before the SAE process, which helps improve the performance of the SAE. Introducing ∆ at this stage enables precise control over the model’s final output....
Liv Gorton
This project introduces a framework for steering the model towards representing more diverse perspectives. Focusing more on their contribution rather than describing existing methodology in detail (e.g. Gemma architecture) would make it easier to follow their paper.
The authors note that they ran out of time and weren't able to implement their proposal. It'd be great to see them continue this work in the future.
Jaime Raldua
Very original idea and promising results!
Cite this work
@misc {
title={
@misc {
},
author={
Qianmian Guo
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}