Nov 25, 2024
BBLLM
Joey SKAF, Mickaël Boillaud, Thaïs Distinguin
This project focuses on enhancing feature interpretability in large language models (LLMs) by visualizing relationships between latent features. Using an interactive graph-based representation, the tool connects co-activated features for specific prompts, enabling intuitive exploration of feature clusters. Deployed as a web application for Llama-3-70B and Llama-3-8B, it provides insights into the organization of latent features and their roles in decision-making processes.
Jaime Raldua
Very interesting visualisation tool! It would have been great to see a bit more of lit.review and see there is specific valua added where other existing techniques fall short. The fact that is ready for local deployment definately deserves extra points.
Liv Gorton
This work presents a way to visualise SAE latents that frequently activate together. With some additional time, it'd be cool to see some insights gained from this kind of technique! It'd be especially cool if there was some insight that wasn't easily uncovered via something like UMAP or PCA over the dictionary vectors.
Tom McGrath
This project develops a visualisation tool for language model SAE latents. Visualisation is an important and underexplored area in interpretability, so it's cool to see this. The visualisation is a graph, where features are connected to one another if they co-occur sufficiently frequently.
The tool is interesting but I'd really like to see some example of the kind of application it might be used in, or an interesting insight (even something very minor) that the authors obtained from using the tool.
Cite this work
@misc {
title={
@misc {
},
author={
Joey SKAF, Mickaël Boillaud, Thaïs Distinguin
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}