Multilingual Jailbreak Vulnerability Benchmark and Mitigation for Low-Resource African Languages
Godwin Abuh Faruna
We present a two-part study: (1) a multilingual jailbreak benchmark revealing large safety gaps in four open-weight LLMs across seven languages including Igala, which has no prior AI safety coverage, and (2) Latent Space Refusal Anchoring (LSR-Anchoring), a training-free activation-steering mitigation that recovers safety at inference time without retraining or target-language data. The combined result is a full pipeline: we find the failure, characterise it geometrically, and fix it, all without supervised African-language data. Across four models tested, English SRR stays at 85–99%, while African language SRR collapses to single digits in some cases. LSR-Anchoring recovers safety to near-ceiling on four languages at 70B scale, with MMLU accuracy drops below 0.35 percentage points. Arabic fails on all methods due to geometric misalignment, providing a clear deployment decision rule.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Multilingual Jailbreak Vulnerability Benchmark and Mitigation for Low-Resource African Languages
},
author={
Godwin Abuh Faruna
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


