Mar 10, 2025
Feature-based analysis of cooperation-relevant behaviour in Prisoner’s Dilemma
Perusha Moodley, Ana Kapros, Maria Kapros
Summary
We hypothesise that internal-based model probing and editing might provide higher signal in multi-agent settings. We implement a small simulation of Prisoner’s Dilemma to probe for cooperation-relevant properties. Our experiments demonstrate that feature-based steering highlights deception-relevant features and does so more strongly than prompt-based steering.
Cite this work:
@misc {
title={
Feature-based analysis of cooperation-relevant behaviour in Prisoner’s Dilemma
},
author={
Perusha Moodley, Ana Kapros, Maria Kapros
},
date={
3/10/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}