Mar 10, 2025

Feature-based analysis of cooperation-relevant behaviour in Prisoner’s Dilemma

Perusha Moodley, Ana Kapros, Maria Kapros

Details

Details

Arrow
Arrow
Arrow

Summary

We hypothesise that internal-based model probing and editing might provide higher signal in multi-agent settings. We implement a small simulation of Prisoner’s Dilemma to probe for cooperation-relevant properties. Our experiments demonstrate that feature-based steering highlights deception-relevant features and does so more strongly than prompt-based steering.

Cite this work:

@misc {

title={

Feature-based analysis of cooperation-relevant behaviour in Prisoner’s Dilemma

},

author={

Perusha Moodley, Ana Kapros, Maria Kapros

},

date={

3/10/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Review

Review

Arrow
Arrow
Arrow

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.