Oct 27, 2024
Understanding Incentives To Build Uninterruptible Agentic AI Systems
Damin Curtis, M.A. International Affairs Norman Piotriowski, B.Sc. Data Science
Summary
This proposal addresses the development of agentic AI systems in the context of national security. While potentially beneficial, they pose significant risks if not aligned with human values. We argue that the increasing autonomy of AI necessitates robust analyses of interruptibility mechanisms, and whether there are scenarios where it is safer to omit them.
Key incentives for creating uninterruptible systems include perceived benefits from uninterrupted operations, low perceived risks to the controller, and fears of adversarial exploitation of shutdown options. Our proposal draws parallels to established systems like constitutions and mutual assured destruction strategies that maintain stability against changing values. In some instances this may be desirable, while in others it poses even greater risks than otherwise accepted.
To mitigate those risks, our proposal recommends implementing comprehensive monitoring to detect misalignment, establishing tiered access to interruption controls, and supporting research on managing adversarial AI threats. Overall, a proactive and multi-layered policy approach is essential to balance the transformative potential of agentic AI with necessary safety measures.
Cite this work:
@misc {
title={
Understanding Incentives To Build Uninterruptible Agentic AI Systems
},
author={
Damin Curtis, M.A. International Affairs Norman Piotriowski, B.Sc. Data Science
},
date={
10/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}