Mar 22, 2026
AFRIII: Adaptive Framework for Repeated Inspection of Instrumental Integrity in AI
LATHITHAA SANGE MDAYI
AFRIII addresses two critical gaps in AI control evaluation: existing frameworks cannot detect whether an agent that passed one audit will re-offend under different conditions, and standard text monitors miss adversarial payloads concealed through encoding obfuscation.
AFRIII combines a Bayesian repeat-attack risk estimator, step-level behavioural feature extraction, and an encoding-aware detection layer covering URL, hex, HTML entity, and mixed multi-layer encoding attacks.
An adversarial red-team module (AFRIIIVV) generates synthetic attack campaigns at configurable difficulty levels and co-evolves with the detector to prevent stagnation.
Evaluated across 10 LinuxArena environments on synthetic trajectory data, AFRIII achieves 84.7% detection of covert sabotage attempts, with a safety–usefulness optimum at suspicion threshold 0.4 (72% attack detection, 82% task usefulness preserved). The encoding detection layer achieves 94.2% accuracy on a 100-variant test suite versus 49% for rule-based baselines.
All results are derived from synthetic data; live deployment validation is future work. The full codebase, mock trajectory datasets, and React dashboard are open-sourced at github.com/SXNGE4444/AFRIII.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) AFRIII: Adaptive Framework for Repeated Inspection of Instrumental Integrity in AI
},
author={
LATHITHAA SANGE MDAYI
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


