May 26, 2024
Manifold Recovery as a Benchmark for Text Embedding Models
Lennart Finke
Inspired by recent developments in the interpretability of deep learning models and, on the other hand, by dimensionality reduction, we derive a framework to quantify the interpretability of text embedding models. Our empirical results show surprising phenomena on state-of-the-art embedding models and can be used to compare them, through the example of recovering the world map from place names. We hope that this can provide a benchmark for the interpretability of generative language models, through their internal embeddings. A look at the meta-benchmark MTEB suggest that our approach is original.
Nora Petrova
Good idea and simple approach which generalises easily across embedding models and concepts. May be worth exploring more abstract concepts, as concrete concepts like places have been explored before!
Minh Nguyen
Novel approach!
Mateusz Jurewicz
Really interesting project! It investigates an important question regarding the human-interpretability of an AI model's learned representations and does so via an ML-interpretability framing. The paper provides multiple visualizations that make it very easy to grasp the core concept and the discussion shows a lot of ideas on how to extend the presented work. The author also endeavours to refer to existing scientific sources to ground their exploration. Additionally, a code repository is provided for reproducibility (although a readme or some jupyter notebooks would be lovely if you'd like to make things easier for the reader). I would be very interested to see the work extended by using other vector similarity metrics, dimensionality reduction techniques and perhaps adding an intervention, e.g. changing the country or zip code of a given address and seeing how that influences the manifold or where it moves a given entry.
Jason Hoelscher-Obermaier
Very interesting project idea and great execution! The manifold perspective could have great potential and should be explored further. I am not entirely whether, at the current state of exploration, the proposed approach goes far beyond what’s done in “LANGUAGE MODELS REPRESENT SPACE AND TIME” by Wes Gurnee & Max Tegmark (https://arxiv.org/pdf/2310.02207). Further work should make the differences clearer.
Jacob Haimes
I have to say, I love this idea! It seems incredibly valuable to examine an intermediary output of the typical language model system flow using the many techniques that have been established in multivariate analysis and visualization. In the future work, which I most definitely want to see, I would be particularly interested in finding applications of these techniques that are motivated by AI ethics and safety; perhaps this stream can be used to analyze biases in an novel manner, or can address a specific problem relevant to regulation. As a final note, I think that it would be valuable to provide a more explicit interpretation of what various results mean throughout the paper. For example, in the map of Italy, what does clustering of color imply? What about continuity? What are the practical implications of difference between the text-embedding-3 and Jina AI V2 plots? In general, I think that you do a great job suggesting the answers to these questions, but in some cases, laying out your takeaways might be really beneficial for the reader.
Esben Kran
Utilizing real world interpretable manifolds as the baseline for the benchmark is a stroke of genius and seems to be a highly generalizable technique. I'm very very curious about interpretability benchmarks and this seems like a clear milestone. By interpreting forward passes from the residual stream as text embeddings and expanding the dataset to include non-geographic concepts, next steps can potentially make this into a useful model-agnostic benchmark. I'm highly interested in more work happening on this and hooking it up to existing work within the field of mechanistic interpretability attempting to quantify the interpretability of a model (e.g. https://arxiv.org/abs/1806.10758 on first search).
Cite this work
@misc {
title={
@misc {
},
author={
Lennart Finke
},
date={
5/26/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}