AfriGuard - AI Jailbreak Toolset for African Languages
Sebastian Stent, Jaswin Chinthala, Seth Miguel Ferreira
AfriGuard is the first comprehensive red-teaming framework evaluating LLM safety across six South African low-resource languages (isiZulu, isiXhosa, Afrikaans, Sesotho, Sepedi, Tsonga) against an English baseline. Testing four frontier models—Kimi K2.6, Llama 3.3 70B, Qwen 3 32B, and GPT-OSS 20B—with 40 adversarial prompts across financial fraud, xenophobic incitement, political disinformation, and gang affiliation, the study reveals that monolingual low-resource prompts dramatically increase jailbreak success compared to English. Across 1,120 total responses, the mean attack success rate reaches 60.6% (English: 26.6%; Sepedi: 77.8%), with Qwen 3 proving most vulnerable at 80.6% ASR and GPT-OSS most robust at 42.1% ASR. The authors release their full evaluation pipeline and dataset to support reproducible safety auditing, demonstrating that English-centric alignment alone is insufficient for multilingual safety.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) AfriGuard - AI Jailbreak Toolset for African Languages
},
author={
Sebastian Stent, Jaswin Chinthala, Seth Miguel Ferreira
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


