Mar 23, 2026
TDMF: Trajectory Deviation Monitor with Forensic Deep-Dive
Abubakar Abdulfatah
TDMF is a trajectory-aware AI safety monitor that detects coordinated multi-step attacks — sequences where each individual action is locally innocent, but the full trajectory constitutes a staged exploit (reconnaissance → payload → exfiltration).
Current action-only monitors score each step in isolation. TDMF maintains a rolling forensic audit ledger and scores actions against a constraint envelope (what the agent is categorically forbidden from doing), not a prediction of what it will do. When violations accumulate, a forensic deep-dive reconstructs causal chains across the full history.
This is a cool idea and points out a genuine flaw with common monitors that aren't given context on future actions. These common monitors are shown all past actions up to the action so they aren't entirely stateless, and in LinuxArena, you need these action level monitors to handle live failure side tasks. But for the existing literature, this author points out a good point that having a monitor not see into the future when evaluating an action is missing alpha. Also they propose hierarchical monitoring which has been explored in the literature, but cool to see anyway.
AI was used a bit to much to write the code and paper. It made the paper too long, with too much detail, and if you look at the codebase, the transcripts used to evaluate the monitor were hardcoded and very minimal: https://github.com/abubakar-xyz/tdmf-ai-control/blob/main/tdmf/eval/transcripts.py . Would have been cool to have seen this done with real agent transcripts (e.g. those available on linuxarena.ai).
Nice implementation of a multi-turn policy, and nice approach.
One minor engineering nit: this is poorly documented, but Luthien policy's are (supposed to be) ~stateless, as the same policy can be acting on many requests at once and you usually don't want those results interfering with each other - there are ways to persist state across turns that are session-specific, but they're poorly documented. This is mostly a criticism of Luthien's documentation at the moment, rather than your implementation.
Cite this work
@misc {
title={
(HckPrj) TDMF: Trajectory Deviation Monitor with Forensic Deep-Dive
},
author={
Abubakar Abdulfatah
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


