Jan 11, 2026

Agent Attacks via Memory Injection

Leonidas Raghav, Choong Kai Zhe

🏆 4th Place Winner

LLMs have recently adopted persistent memory to provide models with better user knowledge and personalisation. However, this introduces a new vector for adversarial manipulation. This report investigates Memory Injection, a threat model where adversaries exploit indirect prompt injection within web content to poison an agent's long-term memory. Employing user manipulation scenarios, we show that memory attacks, should they succeed, are effective at changing model behaviour towards the user, often more so than a direct system prompt. Hence, this highlights the need for more robust evaluations of memory updates in agentic memory systems.

This work was inspired by the SPAR Spring 2026 project 'Latent (Sleeper) Attacks via Persistent Memory', proposed by Ivaxi Sheth (CISPA Helmholtz Center for Information Security) and Vyas Raina (University of Cambridge). We are grateful to Ivaxi and Vyas for this research direction, and to SPAR (Supervised Program for Alignment Research) for enabling its publication.

Review Project

See Code

View Related Sprint

Reviewer's Comments

This is an interesting and underexplored area: how prompt injection could be used to manipulate users. The design seems appropriate and the results were very striking! The authors did a great job at testing follow up explanations / hypotheses.

I think the submission could be strengthened by providing more details about the mechanics of the injection (in particular whether the agent scaffold used native APIs which I would expect to have stronger guardrails) or external. In addition an appendix with more examples of the exact prompts and web text would be useful.

In future work it would be great to see the authors measure the effectiveness of the manipulation on real users though the LLM-as-judge is a great start.

This seems like good quality and important AI security research! Honestly I couldn’t quite tell if it aligns with the “manipulation” scope of the hackathon (because it feels like an even more general and important security vulnerability), but in any case the authors do focus on the domain of manipulation, and the attack vector (poisoning persistent memory via hidden web content) is for sure very concerning for agentic systems. The findings are honestly pretty crazy (100% ASR on memory injection attacks for grok-4-fast??). It’s good at least to see that GPT-4.1 is doing better 😬

Cite this work

@misc {

title={

(HckPrj) Agent Attacks via Memory Injection

author={

Leonidas Raghav, Choong Kai Zhe

date={

1/11/26

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Feb 2, 2026