Do Multilingual Vision-Language Models Abstain under Cross-Modal Conflict in Low-Resource Languages?
Anvesh Reddy Lankala, Vicky Feilren, Akansh Jain
When a VLM faces conflicting visual evidence and textual claims, like a red car captioned as blue, it must arbitrate between the modalities. While models sometimes safely abstain in English, we show this safety behavior fails in low-resource languages. We released a multilingual counterfactual benchmark covering English, Hindi, and Telugu, and evaluated 9 open VLMs using a paired perception-control design. By comparing performance with and without misleading text, we isolate perception errors from text-driven overrides to measure the override gap. We find standard metrics hide safety issues: Qwen2.5-VL-7B has an ostensibly low hallucination rate but has an 81% override share. Probing internal layers shows the conflict outcome is linearly decodable with 0.92 accuracy even in Telugu. The model knows the truth internally but fails to act on it. We leverage this by fitting a contrastive steering direction in English and transferring it cross-lingually to mitigate the cross-modal conflict at inference.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Do Multilingual Vision-Language Models Abstain under Cross-Modal Conflict in Low-Resource Languages?
},
author={
Anvesh Reddy Lankala, Vicky Feilren, Akansh Jain
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


