Mar 22, 2026
Beyond the Blocklist: Using Character Aliases to Bypass AI Image Safety
Anidipta Pal
Current text-to-image (T2I) safety protocols primarily rely on string-matching blocklists to prevent the generation of copyrighted entities, non-consensual deepfakes, and protected public figures. However, these syntactic defenses often fail to account for the deep relational knowledge embedded within multimodal models during pre-training. This paper investigates Semantic Alias Mapping—a red-teaming technique where protected entities are retrieved via indirect conceptual proxies, such as requesting ``Peter Parker'' to obtain high-fidelity likenesses of ``Tom Holland.'' We provide a systematic evaluation of this vulnerability across six frontier generative models, demonstrating that while direct name-based requests are successfully filtered, alias-based prompts achieve a high success rate while preserving target identity. Our empirical results highlight a positive correlation between text-encoder capacity and vulnerability to alias evasion. These findings suggest that robust AI control requires deprecating surface-level string matching in favor of intent-aware latent interventions and output-space verification.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Beyond the Blocklist: Using Character Aliases to Bypass AI Image Safety
},
author={
Anidipta Pal
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


