Jan 11, 2026
Detecting Adversarial Prompts in Business Context
Joshua Kehrer
This work examines how strategically framed user inputs can manipulate AI systems in organizational settings. It investigates whether such manipulation-inducing inputs can be detected before model execution through a lightweight pre-processing layer. The study frames input filtering as a risk-management and governance challenge rather than a purely technical fix.
Shifting the focus from technical jailbreaks to 'business context' manipulation (like urgency or authority framing) is a smart, novel approach. While the initial validation on ~38 samples serves as a good proof-of-concept for the hackathon, I would love to see this expanded to a larger dataset in future iterations to better support the generalization claims. Additionally, verifying the results against human-authored data (e.g., Enron or real phishing logs) would be a fantastic next step to mitigate the risk of circularity inherent in synthetic testing. Great concept that addresses a real gap.
Excellent work overall!
Cite this work
@misc {
title={
(HckPrj) Detecting Adversarial Prompts in Business Context
},
author={
Joshua Kehrer
},
date={
1/11/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


