This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Reprogramming AI Models Hackathon
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
November 25, 2024
Accepted at the 
6710eab8447f62cdea3a653c
 research sprint on 

Sparse Autoencoders and Gemma 2-2B: Pioneering Demographic-Sensitive Language Modeling for Opinion QA

This project investigates the integration of Sparse Autoencoders (SAEs) with the gemma 2-2b lan- guage model to address challenges in opinion-based question answering (QA). Existing language models often produce answers reflecting narrow viewpoints, aligning disproportionately with specific demographics. By leveraging the Opinion QA dataset and introducing group-specific adjustments in the SAE’s latent space, this study aims to steer model outputs toward more diverse perspectives. The proposed framework minimizes reconstruction, sparsity, and KL divergence losses while maintaining interpretability and computational efficiency. Results demonstrate the feasibility of this approach for demographic-sensitive language modeling.

By 
Qianmian Guo
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private