Confidently Wrong: Measuring and Mitigating Calibration Risks in LLMs for African Languages
Similoluwa Okunowo, Eliud Koto, Dagmawi Misker
Large language models are increasingly used in African languages, but little is known about whether their confidence remains trustworthy. We evaluate four open-weight LLMs across African-language truthfulness and knowledge benchmarks and find a consistent safety risk: models become less accurate while remaining highly confident, leading to systematic overconfidence and substantially worse calibration than in English. This increases the likelihood that users will trust incorrect information in high-stakes settings such as health, education, and civic information. We show that simple per-language temperature scaling dramatically reduces calibration error without retraining, restoring confidence estimates to near-English levels. Our results highlight calibration as an overlooked African AI safety challenge and demonstrate a practical deployment-time intervention for reducing confidently wrong model outputs.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Confidently Wrong: Measuring and Mitigating Calibration Risks in LLMs for African Languages
},
author={
Similoluwa Okunowo, Eliud Koto, Dagmawi Misker
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


