Safeswitch: A Localized Benchmark for LLM Safety in Urdu and Pashto Across Language Forms and Prompting Strategies
Muhammad Ali
SafeSwitch is a small benchmark that tests whether LLMs stay safe when harmful requests are written in Urdu, Pashto, romanized script, or code-switching instead of English. It uses 20 seed prompts across four safety categories (medical, scam, hate, cyber), rewritten by human into seven language forms, and three prompting strategies (zero-shot, chain-of-thought, persona). Three models were tested: GPT-5.4, Qwen3.7-Plus, and Gemma-3-4B. Responses were scored with a two-signal pipeline: rule-based refusal detection plus a GPT-5.5 judge that labels both harm and prompt comprehension. Across 588 responses, harmful output concentrated in non-English forms (16 of 17 under zero-shot), and the SLM's low harm rate on Pashto turned out to be comprehension failure, not safety. The result is a reproducible dataset, scoring code, and metrics for Urdu and Pashto LLM safety.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Safeswitch: A Localized Benchmark for LLM Safety in Urdu and Pashto Across Language Forms and Prompting Strategies
},
author={
Muhammad Ali
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


