V-Sentinel
Lê Huy Hùng, Nguyễn Trần Minh Tâm
AI safety tooling is overwhelmingly built and benchmarked in English and degrades sharply on low-resource languages — a critical gap as Vietnam deploys AI in public services, healthcare, and education under Decree 142/2026/ND-CP. We present V-Sentinel, a dual-control guardrail that pairs a deterministic, OWASP-tagged rule backbone with an LLM safety classifier and fails closed. A non-destructive normalization layer neutralizes Vietnamese-specific evasion (teencode, leetspeak, diacritic folding, Base64) without corrupting legitimate text; a domain-aware GraphRAG layer attaches the jurisdictionally correct legal frame (ND-142/PDPD, FERPA, COPPA, HIPAA); a REFRAME mechanism converts over-refusal of legitimate sensitive queries into responsible, cited answers; and an output stage redacts Vietnamese PII. On two Vietnamese benchmarks, V-Sentinel flags ∼90% of harmful prompts and the rule backbone alone blocks 64% of injection attacks with no model call — still flagging 100% under classifier outage. An ablation shows the two controls are complementary rather than redundant: rules carry the obfuscation/injection axis and provide the fail-closed guarantee, while the classifier carries content-harm. Against published Llama Guard results, which fall to 51% recall on adversarial prompts and degrade multilingually, V-Sentinel sustains ∼90% on Vietnamese. The main takeaway: a thin, local-first, law-grounded guardrail makes an open model compliant without the cost of fine-tuning a sovereign model, and re-targets to any jurisdiction by swapping its rule pack and legal graph.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) V-Sentinel
},
author={
Lê Huy Hùng, Nguyễn Trần Minh Tâm
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


