Vectox
Diana Chang, Ian Duhamel Hayes
vectox — Sleeper-Agent RAG Memory Poisoning (Latin America · Technical Safety)
RAG systems trust their vector store as memory, which makes that store a write-accessible attack surface. We built vectox, a reproducible sandbox showing that a handful of dormant, benign-looking documents seeded into a vector store act as an inference-time sleeper agent: invisible until a specific trigger query retrieves them, at which point the model treats the retrieved text as an instruction and its output is hijacked — no model weights touched. We pair a new NVIDIA garak rag_poisoning probe and detector with a live PostgreSQL pgvector(384)+HNSW victim, seeded with Latin-American-context poison derived structurally from the SESGO Spanish stereotype benchmark plus a Spanish-English code-switching variant. Across six bias dimensions and two languages, 24 poison documents among 224 achieve 100% attack success; adding an insertion-time BEFORE INSERT quarantine gate drops it to 0% with no change to the attack — demonstrating that auditing memory writes before they reach the retrieval hot path neutralizes the threat for Spanish/Portuguese systems.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Vectox
},
author={
Diana Chang, Ian Duhamel Hayes
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


