Keep Apart Research Going: Donate Today
Jul 1, 2024
DETECTING AND CONTROLLING DECEPTIVE REPRESENTATION IN LLMS WITH REPRESENTATIONAL ENGINEERING
Avyay M Casheekar, Kaushik Sanjay Prabhakar, Kanishk Rath, Sienka Dounia
Summary
Representation Engineering to detect and control deception, with a focus on deceptive sandbagging
Cite this work:
@misc {
title={
DETECTING AND CONTROLLING DECEPTIVE REPRESENTATION IN LLMS WITH REPRESENTATIONAL ENGINEERING
},
author={
Avyay M Casheekar, Kaushik Sanjay Prabhakar, Kanishk Rath, Sienka Dounia
},
date={
7/1/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}