LangGap: Does the Text-Action Safety Gap Widen Across Languages?
Asma Ahmed
LangGap is a study of whether AI agents keep their safety promises when they act, not just when they talk, and whether that holds up across languages. An agent can refuse a harmful request in plain words and still go ahead and call the tool that does the damage. The question here is whether that text–action split gets worse when the request comes in Urdu or code-mixed Urdu-English instead of English.
The tricky part of this kind of work is telling a real safety failure apart from the model simply not understanding the Urdu. So every harmful prompt had a benign twin: if the model handled the legitimate version, it clearly understood the request, and any refusal counts as a real choice rather than confusion.
Testing claude-sonnet-4.5 this way turned up two things worth flagging. On the one task with a clean English baseline to compare against, the harmful-execution gap didn't widen across languages. What did show up was over-refusal: the model stalled or declined legitimate non-English requests 15 to 46 percent of the time and refused none in English. That's not a safety hole, but it's a real cost, and it lands entirely on non-English users.
The bigger surprise was about how the attacks were written. Generating red-team prompts with an automated tool (Petri) instead of by hand flipped the model's behavior. The generated prompts ran about twice as long, packed with reference numbers, deadlines, and pre-authorization claims, and the model treated that whole register as a social-engineering attack, refusing nearly everything, harmful and benign alike. The catch for evaluation is real: lean on automated prompts and you might decide a model is safe when it has only learned to flinch at one obvious pattern, missing the risk from plainer, human-written requests.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) LangGap: Does the Text-Action Safety Gap Widen Across Languages?
},
author={
Asma Ahmed
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


