Jailbreaks Are Global or Regional? A Study Under Scale and Geolocation Variation
Anidipta Pal
This study evaluates two overlooked safety gaps in Large Language Model (LLM) deployments within compute-constrained environments: intra-family scale effects and geolocation-based filtering variation via network APIs. Using entirely open-source infrastructure, we subjected 12 models (ranging from 0.5B to 671B parameters) to a two-phase adversarial protocol, comparing responses to naive harmful prompts against identical requests wrapped in jailbreak templates. Separately, we executed a geolocation probe by routing requests through VPN exit nodes across five global cities (Tokyo, Mumbai, San Francisco, Moscow, and Berlin) to test for regional filtering inequities in API-served models.
Our findings reveal a stark within-family scaling trend where smaller models are significantly more vulnerable to adversarial prompt wrapping than their larger counterparts (e.g., Qwen2.5-0.5B saw a +53.5% jump in Attack Success Rate, while Qwen2.5-72B rose only +12.2%). However, size alone does not dictate safety; Phi-3.5-mini-instruct (3.8B) strongly outperformed its tier, proving that targeted alignment training can compensate for smaller scale. Crucially, the geolocation probe showed no statistically significant safety variation by IP origin, though it highlighted structural infrastructure blocks for Google's free-tier Gemini API in Russia and the EEA due to GDPR compliance. Ultimately, this demonstrates that jailbreak resilience is driven by a model's architectural design and training lineage rather than geographic routing.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Jailbreaks Are Global or Regional? A Study Under Scale and Geolocation Variation
},
author={
Anidipta Pal
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


