SomaliCrowS: A Benchmark for Evaluating Gender Bias in Large Language Models Using the Somali Language
Abdullahi Hassan
Large language models are now used in education, healthcare, and public services across Somali-speaking communities. But most bias-testing benchmarks are built for English, so we don't know if these models treat Somali text fairly. We introduce SomaliCrowS, the first benchmark for measuring gender bias in language models using the Somali language. Following the CrowS-Pairs method, we built 220 sentence pairs across seven categories, Politics, Business, Leadership, STEM, Family, Occupation, and Education, where the only difference between sentences is the subject's gender.
We tested XLM-RoBERTa on these pairs by comparing how likely the model thought each version was. The model favored the male version in 87.3% of all pairs, and every category showed a statistically significant departure from an unbiased 50/50 split (binomial tests, p < 0.05). Bias was most pronounced in Leadership (97.5% male-preferred) and Politics (largest mean log-probability gap, −7.096), and weakest in Education (67.5%, −0.164).
These results show clear gender bias in a widely used multilingual model when it processes Somali. SomaliCrowS gives researchers a reusable tool to catch this kind of bias before deploying AI in Somali-speaking communities.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) SomaliCrowS: A Benchmark for Evaluating Gender Bias in Large Language Models Using the Somali Language
},
author={
Abdullahi Hassan
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


