Keep Apart Research Going: Donate Today
Jun 2, 2025
Cross-Lingual Bias Detection in Large Language Models through Mechanistic Judge Model Evaluation
Shu Fan Sun, Fanzan Abbas, Wanjie Zhong
Summary
This paper provides a framework for detecting multilingual bias in LLMs through mechanistic interpretability of judge model behaviour. We developed an approach to evaluate language-dependent discrepancies in specialised models by leveraging semantically equivalent question-answering tasks across English, Chinese, Romanian and Vietnamese using the XQuAD dataset. Our framework utilises a judge-based evaluation system that assesses both correctness and reasoning quality on a 4-point scale, enabling the detection of biases that may not have been detected in traditional accuracy metrics alone. We demonstrate that our approach can detect these differences in model performance across languages without requiring ground truth labels, making it applicable to scenarios where traditional evaluation methods are insufficient. Our work advances Track 1 (Judge Model Development) by providing an interpretable bias detection mechanism that promotes fairness and reliability in multilingual AI systems. This framework is easily extensible to additional languages and can serve as a foundation for building fair expert orchestration systems.
Cite this work:
@misc {
title={
},
author={
Shu Fan Sun, Fanzan Abbas, Wanjie Zhong
},
date={
6/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}