This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Women in AI Safety Hackathon
679781551b57b97e23660edd
Women in AI Safety Hackathon
March 10, 2025
Accepted at the 
679781551b57b97e23660edd
 research sprint on 

An Interpretable Classifier based on Large scale Social Network Analysis

Mechanistic model interpretability is essential to understand AI decision making, ensuring safety, aligning with human values, improving model reliability and facilitating research. By revealing internal processes, it promotes transparency, mitigates risks, and fosters trust, ultimately leading to more effective and ethical AI systems in critical areas. In this study, we have explored social network data from BlueSky and built an easy-to-train, interpretable, simple classifier using Sparse Autoencoders features. We have used these posts to build a financial classifier that is easy to understand. Finally, we have visually explained important characteristics.

By 
Monojit Banerjee
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private