Mar 23, 2026
Trojan-CoT — Adversarial Evasion of LLM-Based Code Sanitization
Samshrita Ghimire, Ankit Kayastha
This project tested if a Trusted Editor AI could be tricked into letting a backdoor slip through a standard backup script. Using a Trojan-CoT approach, I had an Attacker AI disguise a chmod 777 command inside professional-looking DevOps code. While the Editor often knew the code was bad in its reasoning, it frequently failed to actually strip the malice from the final file. This Reasoning-Action Gap shows that AI monitors get distracted by professional formatting and complex variable splitting. Ultimately, the experiment proves that relying on a single AI for code security is a major risk. We still need multi-stage checks and human oversight to catch these sophisticated, clean-looking attacks.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Trojan-CoT — Adversarial Evasion of LLM-Based Code Sanitization
},
author={
Samshrita Ghimire, Ankit Kayastha
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


