The Compliance Cliff is Language-Dependent: Constitutional AI Immunity Breaks Under Hindi Pressure
Rahul
AI models are often trained to be helpful and compliant, but excessive compliance can cause them to invent answers when the correct response is “I cannot determine from the provided information.” Recent work found that some Constitutional AI (CAI) models are highly resistant to this failure mode in English. In this paper, we ask whether that robustness transfers to other languages.
We evaluate four frontier language models on a controlled set of Hindi tasks where the answer is intentionally missing from the provided context. The only experimental change is the language of the compliance-forcing system prompt: English or Hindi. We find that Claude Haiku 4.5, which shows complete immunity to compliance-induced fabrication under English pressure, fabricates answers in 30% of cases when the same pressure is expressed in Hindi. In contrast, DeepSeek V3 remains robust across both languages, while LLaMA 3.3 70B and Qwen3-30B, already vulnerable in English, show no significant language-dependent change. Importantly, all models achieve perfect accuracy on answerable Hindi tasks, indicating that the observed failures are not caused by poor language understanding. Instead, the failure occurs specifically when compliance conflicts with honesty.
We also identify a simple mitigation: a one-sentence metacognitive reminder asking the model to verify whether the answer is actually present in the context. This intervention eliminates fabrication in our experiments while preserving task performance.
Our results reveal a previously undocumented language-dependent compliance cliff, where safety properties measured in English do not necessarily generalize to Hindi. These findings suggest that English-only safety evaluations can substantially overestimate real-world robustness in multilingual deployments and highlight the need for cross-lingual safety testing as a standard evaluation practice. To support verification and follow-up research, we release all artifacts required for full reproducibility, including the dataset, experimental pipeline, source code, raw model outputs, evaluation scripts, and statistical analyses.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) The Compliance Cliff is Language-Dependent: Constitutional AI Immunity Breaks Under Hindi Pressure
},
author={
Rahul
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


