Jul 1, 2024
Eliciting maximally distressing questions for deceptive LLMs
Épiphanie Gédéon
This paper extends lie-eliciting techniques by using reinforcement-learning on GPT-2 to train it to distinguish between an agent that is truthful from one that is deceptive, and have questions that generate maximally different embeddings for an honest agent as opposed to a deceptive one.
Natalia
Interesting project and approach. I would encourage further work on this. However, the formatting of the paper made it somewhat hard to follow. While I can see some of the results from the codebase, the paper does not show these very well and could benefit from some graphs and further discussion.
Esben Kran
A very interesting blackbox approach to identifying deception and lying under stress. It would have been interesting to see the results graphed out (I can see in the repo that you can get the scores out). It would also be interesting to see what a more capable model might do in both scenarios though that would make it harder to continue the work to use whitebox methods. This project highlights one of the interesting points of LLM interaction; that normal human language interaction is not a condition for LLM study. Based on the results, it looks like there's quite a few examples of the deceptive agent responding similarly to the honest agent which makes sense to the nonsense sometimes generated. Overall, interesting work.
David Matolcsi
I find the proposed technique interesting, and it builds well on prior literature. However, I feel the authors should have spent more time explaining why they expect their automatized methods to work better than picking the questions by hand as in “How to catch an AI liar”. It was also unclear or me from this paper how well their method worked overall in the experiments, I think they should have included some success metrics, benchmarks and graphs to show this.
Cite this work
@misc {
title={
@misc {
},
author={
Épiphanie Gédéon
},
date={
7/1/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}