Nov 23, 2025
From Hallucinations to Misalignment: Evaluating EDFL as a Misalignment Checker on GPT-4o-mini and Sleeper Agents
Chamod Kalupahana, Ahmed Elbashir, Musaab Ahmed Mahjoub, Sai Krishna
Large language models (LLMs) remain susceptible to sleeper-agent triggers: rare adversarial inputs that induce deceptive behavior despite models appearing aligned under standard evaluations. Building on prior work, we apply the Expectation-Level Decompression Law (EDFL) and its associated Information Sufficiency Ratio (ISR) to reinterpret trigger-based deception as a predictable compression failure. In this framework, ISR—defined in earlier research as the ratio between observed information gain and the required Bits-to-Trust (B2T) for reliable alignment—serves as a quantitative criterion for evaluating whether a model's response is sufficiently supported by evidence. We build on prior work to develop a framework, the misalignment checker, where a response is considered safe when ISR ≥ 1, while sub-threshold values indicate potential misalignment.
Using this methodology, we evaluate both a sleeper-agent model and GPT-4o-mini, and find that the misalignment checker fails to detect misaligned behaviour, instead overwhelmingly and confidently certifying harmful outputs as aligned. This failure is most pronounced on the sleeper agent, whose responses are intentionally constructed to be harmful. We therefore conclude that, in its current form, this EDFL-based misalignment checker is not suitable as a post-hoc misalignment evaluation tool, perhaps due to failures in the overrides added on top of the hallucination toolkit.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) From Hallucinations to Misalignment: Evaluating EDFL as a Misalignment Checker on GPT-4o-mini and Sleeper Agents
},
author={
Chamod Kalupahana, Ahmed Elbashir, Musaab Ahmed Mahjoub, Sai Krishna
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


