Manifold Recovery as a Benchmark for Text Embedding Models

Lennart Finke

Inspired by recent developments in the interpretability of deep learning models and, on the other hand, by dimensionality reduction, we derive a framework to quantify the interpretability of text embedding models. Our empirical results show surprising phenomena on state-of-the-art embedding models and can be used to compare them, through the example of recovering the world map from place names. We hope that this can provide a benchmark for the interpretability of generative language models, through their internal embeddings. A look at the meta-benchmark MTEB suggest that our approach is original.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Nora Petrova

Good idea and simple approach which generalises easily across embedding models and concepts. May be worth exploring more abstract concepts, as concrete concepts like places have been explored before!

Minh Nguyen

Novel approach!

Mateusz Jurewicz

Really interesting project! It investigates an important question regarding the human-interpretability of an AI model's learned representations and does so via an ML-interpretability framing. The paper provides multiple visualizations that make it very easy to grasp the core concept and the discussion shows a lot of ideas on how to extend the presented work. The author also endeavours to refer to existing scientific sources to ground their exploration. Additionally, a code repository is provided for reproducibility (although a readme or some jupyter notebooks would be lovely if you'd like to make things easier for the reader). I would be very interested to see the work extended by using other vector similarity metrics, dimensionality reduction techniques and perhaps adding an intervention, e.g. changing the country or zip code of a given address and seeing how that influences the manifold or where it moves a given entry.

Jason Hoelscher-Obermaier

Very interesting project idea and great execution!   The manifold perspective could have great potential and should be explored further. I am not entirely whether, at the current state of exploration, the proposed approach goes far beyond what’s done in “LANGUAGE MODELS REPRESENT SPACE AND TIME” by Wes Gurnee & Max Tegmark (https://arxiv.org/pdf/2310.02207). Further work should make the differences clearer.

Jacob Haimes

I have to say, I love this idea! It seems incredibly valuable to examine an intermediary output of the typical language model system flow using the many techniques that have been established in multivariate analysis and visualization. In the future work, which I most definitely want to see, I would be particularly interested in finding applications of these techniques that are motivated by AI ethics and safety; perhaps this stream can be used to analyze biases in an novel manner, or can address a specific problem relevant to regulation. As a final note, I think that it would be valuable to provide a more explicit interpretation of what various results mean throughout the paper. For example, in the map of Italy, what does clustering of color imply? What about continuity? What are the practical implications of difference between the text-embedding-3 and Jina AI V2 plots? In general, I think that you do a great job suggesting the answers to these questions, but in some cases, laying out your takeaways might be really beneficial for the reader.

Esben Kran

Utilizing real world interpretable manifolds as the baseline for the benchmark is a stroke of genius and seems to be a highly generalizable technique. I'm very very curious about interpretability benchmarks and this seems like a clear milestone. By interpreting forward passes from the residual stream as text embeddings and expanding the dataset to include non-geographic concepts, next steps can potentially make this into a useful model-agnostic benchmark. I'm highly interested in more work happening on this and hooking it up to existing work within the field of mechanistic interpretability attempting to quantify the interpretability of a model (e.g. https://arxiv.org/abs/1806.10758 on first search).

Cite this work

@misc {

title={

@misc {

},

author={

Lennart Finke

},

date={

5/26/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

Mar 31, 2025

Model Models: Simulating a Trusted Monitor

We offer initial investigations into whether the untrusted model can 'simulate' the trusted monitor: is U able to successfully guess what suspicion score T will assign in the APPS setting? We also offer a clean, modular codebase which we hope can be used to streamline future research into this question.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.