AI-to-AI vs Human-to-AI: Measuring Behavior Differences Under Disagreement
Mayur Jadhav
As AI systems increasingly operate in autonomous multi-agent environments, a single faulty or hallucinating agent can influence the decisions of an entire multi-agent workflow. Despite this growing reliance on AI-to-AI communication, most evaluations focus only on human-AI interactions. In this project, we investigate whether language models respond differently to identical incorrect feedback depending on whether it comes from a human or an AI collaborator. We evaluated frontier language models from across mathematics, question-answering, and argument-evaluation tasks. We found that models were consistently more likely to maintain correct answers, exhibit lower uncertainty, and resist incorrect feedback when interacting with an AI collaborator, while showing greater deference to the same feedback when it came from a human collaborator. These findings suggest that collaborator identity influences model behavior and may represent an overlooked vulnerability in multi-agent AI systems. If language models place different levels of trust in feedback based on the perceived identity of a collaborator, malicious or deceptive agents could potentially manipulate outcomes by faking identity.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) AI-to-AI vs Human-to-AI: Measuring Behavior Differences Under Disagreement
},
author={
Mayur Jadhav
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


