Jul 27, 2025

Thermodynamics-inspired OOD Detection

Salmaan Barday, Gary Louw

A different approach to Thermodynamics-inspired Out-of-distribution detection Detection.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

The authors present a method for out-of-distribution detection using pre-trained image classifiers. The way they've quantified performance is clear, and figure 1 suggests their method beats an energy-based baseline, which is exciting. The presentation could be clearer - I'm not sure how they define "blockwise energy" E_\ell(x) (I think it's based off projecting to logits for each layer but am not very certain). These results add to a growing body of work showing that intermediate-layer activations may be more useful for probing tasks than final layer activations (see e.g. https://arxiv.org/pdf/2412.09563v1). Confirming/refuting such theories seems like a nice goal for a hackathon.

The project investigates energy-based out-of-distribution detection by studying an extension of that method, in which the energy is computed at every residual block. The approach makes sense and the results are presented clearly. However, no mention is made of AI safety, and the project would have benefited from articulating a clear connection between the research performed and AI safety challenges.

Very interesting work! Seeing comparisons in OOD performance with other methods would indeed be interesting, as you note. The questions with new tools is always *whether they work*. In terms of detecting OOD samples better, this is really valuable and having a strong framework for doing this seems important for nearly all relevant real-world fields of NN use, such as medical and law, providing a heuristic for rejecting answers or at least providing human-readable warnings on the outputs.

However, I think there's another extension of all this work that would be really interesting: Whether we can evaluate the fuzzy boundaries of what is OOD to the models and what isn't. And whether we can see examples of algorithmic generalization (e.g. learning addition instead of memorizing results of single-digit addition tasks) through this method. I'm not certain how feasible it is, but having a proper evaluation of generalized OOD performance that transcends single tasks (i.e. through a benchmark-based method) seems incredibly important and like a feasible extension to the concepts and tools introduced in this work.

While the method is interesting and a reasonable extension, the experimental setup is unclear and limits reproducibility. For example, it isn’t specified how intermediate block activations are mapped to class logits per block (auxiliary heads? a shared classifier? pooling + linear map?), yet the energy definition requires logits.

Block‑wise energy pooling is a tidy proof‑of‑concept, but its benefit over the OOD detector from Liu et al. was unmotivated and unclear. A minimal result could have been to compare this with the reference, and to contextualize this a bit more within the literature (as it was the only reference used). I'm not convinced that this approach has any real implications for AI safety.

The authors could also have explained their experiments and plots more carefully.

Cite this work

@misc {

title={

(HckPrj) Thermodynamics-inspired OOD Detection

},

author={

Salmaan Barday, Gary Louw

},

date={

7/27/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.