Nov 23, 2025
A Prospect Theoretic Approach to Agentic AI Safety
Lucas Irwin
In this project, I introduce Prospect-Theory CodeAct, a prompting framework that
incorporates Kahneman & Tversky’s Prospect Theory into the decision loop of AI agents. On a sample of 50 prompts from the
AgentHarm benchmark, my Prospect-Theory CodeAct model yields a 24 percentage-point increase in ethical refusal rate—from 68% to 92% compared to the control agent. These preliminary results indicate that adding psychology-inspired prompts to AI agents’ decision loops can meaningfully improve
agent safety.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) A Prospect Theoretic Approach to Agentic AI Safety
},
author={
Lucas Irwin
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


