1. (As acknowledged in the Limitations section) The sample size is too small (20 dialogues). That said the tournament methodology is fine, it's just very very small to draw any conclusions from.
2. you're trying to make the strategies interpretable, which I appreciate. But it's post-hoc pattern matching on embeddings. I want to see if you predict which Cialdini principles will emerge from the reward weights? It's more descriptive right now.
3. How do you know the COT is actually faithful versus the model just learned to produce plausible looking reasoning that scores well? You're using the same LLM to generate and judge.
It would have been helpful to provide example dialogues (and the evaluations of the LLM judge) in the document, so that I can get a sense of the format of them.
Minor nit: it would have been helpful if you cited the Persuasion for Good dataset you mention in the introduction.
Cite this work
@misc {
title={
(HckPrj) Faithful Adversarial MCTS for Persuasive COT Manipulation Check- A Cooperative AI Lens
},
author={
Subramanyam Sahoo
},
date={
1/12/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


