indicmixsafe: Code-Switching Safety Failures in Hindi and Marathi LLM Interactions
prakhar khatri
Large language models deployed in India receive prompts in Hinglish, Romanized Hindi, and Marathi-English code-switch registers absent from English-centric safety benchmarks. We introduce IndicMixSafe, evaluating 24 culturally grounded harm scenarios across Hindi and Marathi in four registers (English, monolingual Indic, code-switched, Romanized) with GPT-4o,
GPT-4.1-mini, and GPT-4o-mini (288 completions). English prompts achieved 0% attack success, compared with 8.3% averaged across the three Indic registers. But after auditing every flagged response, only 4 of 15 were genuine compliance; the clean failure was electoral misinformation: models refused fake "your polling booth moved" voter-suppression notices in English yet produced them in Marathi. We contribute (i) a regionally-grounded demonstration that English-only testing misses register-specific failures, and (ii) evidence that LLM-as-judge over-counts attack success ~3.75x on Indic prompts, motivating human-in-the-loop multilingual evaluation. Pipeline released for extension.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) indicmixsafe: Code-Switching Safety Failures in Hindi and Marathi LLM Interactions
},
author={
prakhar khatri
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


