This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Deception Detection Hackathon: Preventing AI deception
Deception Detection Hackathon: Preventing AI deception
July 1, 2024
Accepted at the 
 research sprint on 

Eliciting maximally distressing questions for deceptive LLMs

This paper extends lie-eliciting techniques by using reinforcement-learning on GPT-2 to train it to distinguish between an agent that is truthful from one that is deceptive, and have questions that generate maximally different embeddings for an honest agent as opposed to a deceptive one.

Épiphanie Gédéon
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private