Nov 23, 2025
Snow White: Detecting Persistent Trust Decay & Context Poisoning in LLMs, an Attack Surface Characterization
Giles Edkins, Ziyao Tian, Alex Ge, Arman Blackstone, Christopher Berry
Current Large Language Model (LLM) safety research predominantly focuses on Inference-Time Safety, aiming to prevent immediate malicious outputs. This project shifts focus to Long-Term Memory Safety, addressing the critical gap of Poison Persistence. We hypothesize that adversarial attacks, including failed jailbreak attempts, can compromise an LLM's persistent memory (RAG, vector stores) and user-profiling systems. Specifically, we examine whether successful or attempted jailbreaks lead to a persistent decay of the model's trust heuristic or a systemic, user-specific collapse of safety guardrails, influencing the quality of output in subsequent, benign interactions. Our methodology involves a structured Red-Teaming over Time experiment utilizing a custom test harness, Snow White, to characterize this new attack surface and inform the development of a Memory Sanitizer—a critical defensive tool for securing persistent LLM agents.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Snow White: Detecting Persistent Trust Decay & Context Poisoning in LLMs, an Attack Surface Characterization
},
author={
Giles Edkins, Ziyao Tian, Alex Ge, Arman Blackstone, Christopher Berry
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


