Nov 23, 2025
Sentinel Trace: Open-Source AI Monitoring Dashboard With Pre-Training Data Tracing And In-Flight DPO Dataset Creation
Neel Rajani, Pedro Ginel Camacho
We present Sentinel Trace, a monitoring and safety system for open-weight language models that combines real-time guardrails with interpretable failure analysis. Our architecture pairs a frontier model (OLMo-2 13B Instruct) with a lightweight guard model (Qwen3Guard-Gen 0.6B) that detects unsafe outputs and triggers automatic re-prompting. When adversarial prompts cause problematic responses, we make use of OLMoTrace to trace outputs back to the real pre-training data of the model, providing actionable insights to model providers for pre-training corpus filtering. All interactions are monitored through a Grafana dashboard tracking jailbreak attempts, detection rates, and re-prompting success. Using these insights, we implement a Direct Preference Optimization (DPO) feedback loop that refines training data based on guard classifications and traced origins, enabling the model to learn from failures instead of relying solely on post-hoc filtering. This approach demonstrates that combining runtime safety guardrails with training data traceability creates continuous improvement: interpretable monitoring informs data refinement, which improves model robustness against future attacks.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Sentinel Trace: Open-Source AI Monitoring Dashboard With Pre-Training Data Tracing And In-Flight DPO Dataset Creation
},
author={
Neel Rajani, Pedro Ginel Camacho
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


