Check out the results from the AI Safety X Physics Grand Challenge! 👉

Jun 2, 2025

View Related Sprint

LLM Fingerprinting Through Semantic Variability

Luiza Corpaci, Chris Forrester, Siddhesh Pawar

This project develops an LLM fingerprinting and analysis toolkit to increase transparency in AI routing systems, addressing Track 2: Intelligent Router Systems through two key investigations. We adapted semantic variability analysis to create unique behavioral fingerprints that can identify which specific models are operating behind opaque routing services, and conducted tool detection experiments under semantic noise to assess model robustness. Our findings demonstrate that models maintain high semantic robustness while our fingerprinting technique successfully distinguishes between different models based on their response patterns. These contributions aid the Expert Orchestration Architecture vision by providing practical tools for auditing multi-model AI systems, enabling organizations to understand which models their routers actually use and verify their reliability under real-world conditions, ultimately making router systems more transparent and trustworthy for production deployment.

Download

Review Project

See Code

Reviewer's Comments

Jason Hoelscher-Obermaier

Thinking about potential transparency challenge in router-based ecosystems is valuable and the research questions are well-motivated. The technical execution seems good and the visualizations are really well done.

Here are some key areas to strengthen this work:

- Get even clearer on the impact / motivation: Wouldn't routing providers surface info about the model selection as a feature to sell? Are there specific scenarios where opacity matters? If so zoom in on those

- Make it clearer how we could reliably assign a single response to a specific model. The fingerprint distributions seem to heavily overlap such that attribution does not seem easy at all

- The robustness testing part is not integrated well enough with the fingerprinting. If the connection is not too strong the project becomes better by splitting it off and keeping each contribution focused on one core topic

- Add simple baselines or methods from the literature. How does your fingerprinting approach compare to those?

Philip Quirke

Thank you for your submission. This is an interesting paper and the results seem good.

As I understand what is proposed, I’m not clear on the benefits of the approach. I assume that the EO implementers will be transparent about which model(s) they are routing to. Certainly this is Martian’s intent / implementation. Can you foresee a use case where this is not so? If so, then it would be good to document this in the paper.

Amir Abdullah

Being able to decode which model is actually behind the router is an interesting and good choice of problem. While this approach of semantic variability analysis is interesting, I worry that it's brittle in it's current state to (for example) the hyperparameters of the decoding algorithms being applied from "behind the router" (such as temperature, decoding strategy, etc).

It would be very interesting to see if a classifier could correctly map the signature t one of the models being specified.

Investigation 2 on how distraction affects tool use is interesting, but perhaps slightly off topic for this hackathon.

Amir Abdullah

I like the approach of combining embedding metrics and prompt engineering. It would be good to explore further what dimensions can be added, and how well judges handle multiple attributes.

Ram S

great idea to avoid semantic drift

Sara Jenine

This is phenomenal! My team and I will be using it moving forward.

Erin Saint Gull

This is a well-executed research project that tackles critical AI transparency challenges with impressive methods. The cognitive load experiments reveal fascinating insights about AI attention management, particularly that technical jargon causes 96% tool reduction despite 100% acknowledgment, mirroring human selective attention failures. The most important finding, in my opinion, is that meta-analysis of the task constitutes cognitive load that isn’t recognized. Identifying this is crucial for AI safety improvements and defending against potential prompt injections.

Furthermore, this project addresses the model-routing opacity problem for AI governance and compliance with an impressive and novel method: the "hyperstrings" concept and approach to probe model consistency patterns and successfully distinguishing between 11 different models with statistical significance demonstrates real technical achievement. The experimental design shows scientific rigor with systematic similarity measurements and clear statistical testing, while the preferential tool-dropping behavior (abandoning enhancements while preserving core functionality) provides actionable guidance for building robust AI systems.

The research successfully bridges theoretical AI interpretability with practical system transparency. While sample sizes could be larger for broader validation, the work creates exactly the kind of transparency tools needed as AI systems become more complex - understanding both "which model is running" and "how reliably it performs under real-world conditions." This is a strong contribution with immediate applicability to production environments.

Nick

Very cool

Bianca Dragomir

Interesting problem and interesting approach; I'm curious how the compression method could be used for other applications too, e.g. AI detectors?

Aritra

The paper focusses on a single task domain (restaurant reservations) limits generality; diversifying tool-calling tasks or adding a qualitative error analysis might result in richer and different failure modes.

Bruno Peixoto

The project is reproducible and shows a systematic methodology for evaluating. The metrics are clear shown, the code is available in a github repository. 10/10

Alex

Very innovative solution!

Cite this work

@misc {

title={

@misc {

author={

Luiza Corpaci, Chris Forrester, Siddhesh Pawar

date={

6/2/25

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Mar 31, 2025

Model Models: Simulating a Trusted Monitor

We offer initial investigations into whether the untrusted model can 'simulate' the trusted monitor: is U able to successfully guess what suspicion score T will assign in the APPS setting? We also offer a clean, modular codebase which we hope can be used to streamline future research into this question.

May 20, 2025