Mar 30, 2025
Kernel of Trust: Evaluating AI Control Protocols Using Open-Source Data
AJ Weeks
Summary
We propose a new setting to evaluate AI Control policies, namely the change history of the Linux kernel. This real-world data source provides a complement to the APPS backdooring setting used in the original AI Control paper.
Cite this work:
@misc {
title={
Kernel of Trust: Evaluating AI Control Protocols Using Open-Source Data
},
author={
AJ Weeks
},
date={
3/30/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}