Aug 25, 2024
Sleeper Agents Detector
Michaël Trazzi and Saahir Vazirani
We present "Sleeper Agent Detector," an interactive web
application designed to educate software engineers, Inspired by recent research demonstrating that large language models can exhibit behaviors analogous to deceptive alignment
Worthwhile topic. Yet, after demo I was confused, was the model just prompted to write vulnerable code? How did it depend on the year? Surely, I can indeed ask it to write vulnerable code.
Very nice presentation.Overall, I think the demonstration fails to convey why sleeper agents are important or what it would look like to live in a world where we rely on an LLM that can suddenly switch.
Cite this work
@misc {
title={
Sleeper Agents Detector
},
author={
Michaël Trazzi and Saahir Vazirani
},
date={
8/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


