Jul 1, 2024
An Exploration of Current Theory of Mind Evals
John Henderson, Alan Fung, Bachar Moustapha
We evaluated the performance of a prominent large language model from Anthropic, on the Theory of Mind evaluation developed by the AI Safety Institute (AISI). Our investigation revealed issues with the dataset used by AISI for this evaluation.
Great that you identified the false negatives and performed further manual assessments on these (and that you opened the issue on the AISI repo!). Including some background on the nature of the dataset used might have improved readability. While the work is valuable, I would also have liked to see more detail on future research directions.
I’m glad that someone did this work, and it’s useful to check and point out errors in important datasets used by AISI. I think however that we could learn more from exploring new deception-detection techniques on small examples than from just running a particular model on a particular dataset.
Cite this work
@misc {
title={
An Exploration of Current Theory of Mind Evals
},
author={
John Henderson, Alan Fung, Bachar Moustapha
},
date={
7/1/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


