Mar 22, 2026
When Safety Becomes the Vulnerability
Caleb Rudnick, August Lina
Any base64-encoded string included in a message to the Claude API causes that request to fail. This behaviour is reproducible across Sonnet and Opus 4.6, and across all platforms including the API, web and mobile applications, and Claude Code. While the blanket rejection of base64 content was originally a reasonable defence against prompt injection—attackers could encode malicious instructions to bypass keyword-based safety filters—the measure has become a liability as large language models have moved from conversational assistants to critical infrastructure components.
the report is very well written and presented to the level of a good conference paper.
im a bit skeptical of the threat model (base64-injection attacks causing critical system failures due to hitting refusals from the model provider). I feel like systems should be basically robust to model outages and base64 attacks might be a specific cause of outages. My guess is there might be narrow situations in which an attack can strategically cause a model outage leading to a security vulnerability, but that this isn't a super important threat vector.
FYI I tried running a simple example from the report (aGVsbG8gd29ybGQ= “hello world”) in claude.ai and it did not trigger a refusal (the model just understood the message and responded appropriately).
I agree with the broad dynamic:
>This is a concrete instance of a broader principle in AI control research: that safety mechanisms which
fail catastrophically (by crashing the system or returning no response) rather than degrading
gracefully (by flagging suspicious content and continuing) can be weaponised by adversaries
who understand the failure mode.
But I think this type of threat should just be caught by standard testing and security protocols and isn't a major threat model. I think it's good the report points out the dynamic.
Cite this work
@misc {
title={
(HckPrj) When Safety Becomes the Vulnerability
},
author={
Caleb Rudnick, August Lina
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


