This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Policy Hackathon at Johns Hopkins University
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
October 28, 2024
Accepted at the 
670822f88b8fdf04a35a4b76
 research sprint on 

Understanding Incentives To Build Uninterruptible Agentic AI Systems

This proposal addresses the development of agentic AI systems in the context of national security. While potentially beneficial, they pose significant risks if not aligned with human values. We argue that the increasing autonomy of AI necessitates robust analyses of interruptibility mechanisms, and whether there are scenarios where it is safer to omit them. Key incentives for creating uninterruptible systems include perceived benefits from uninterrupted operations, low perceived risks to the controller, and fears of adversarial exploitation of shutdown options. Our proposal draws parallels to established systems like constitutions and mutual assured destruction strategies that maintain stability against changing values. In some instances this may be desirable, while in others it poses even greater risks than otherwise accepted. To mitigate those risks, our proposal recommends implementing comprehensive monitoring to detect misalignment, establishing tiered access to interruption controls, and supporting research on managing adversarial AI threats. Overall, a proactive and multi-layered policy approach is essential to balance the transformative potential of agentic AI with necessary safety measures.

By 
Damin Curtis, M.A. International Affairs Norman Piotriowski, B.Sc. Data Science
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private