Mar 22, 2026
Perturbation-Based Generation Profiling Detects Covert AI Agent Attacks Where Token-Level Statistics Fail
Yatharth Maheshwari, Arka Dash
We propose Perturbation-Based Generation Profiling (PBGP), an unsupervised protocol detecting covert AI agent attacks by comparing generation profiles across context perturbations. Across six models (8B–32B), token-level statistical features achieve only 0.56–0.83 AUROC. PBGP achieves 0.94–1.00 AUROC with zero training data, requiring only output logprobs and isolated re-runs.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Perturbation-Based Generation Profiling Detects Covert AI Agent Attacks Where Token-Level Statistics Fail
},
author={
Yatharth Maheshwari, Arka Dash
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


