Summary
One of the current focuses of AI Control research is finding new settings within which to examine the control problem. In this hackathon, we looked at defining a new AI control setting based around summarizing documents to mislead an overseer. We expect this problem to become more important as AI capabilities are more frequently used for synthesizibg and summarizing information, especially in AI safety research. As such, we worked on adding a new setting to the UKAISI Control Arena repository. The current state is a functional Inpect eval where redteams and blueteams could iterate on different strategies. However, the setting is not integrated into the Control Arena framework and only the most basic redteam & blueteam strategies have been implmeneted. Future work would move protocol and policy code into the existing Control Arena baseline, enable native pdf support, iterate further adversarially, and streamline AI calls which can be made asynchronously.
Cite this work:
@misc {
title={
Adding Document Summaries to Control Arena
},
author={
Keshav Shenoy
},
date={
3/31/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}