Nov 23, 2025
Trusted Model Supervisor
Kalpesh Panchal
Trusted Model Supervisor is a defensive AI control layer designed to monitor, audit, and flag potentially harmful behavior from untrusted large language models. As increasingly capable AI systems are deployed in environments where misuse or misalignment can cause significant harm, operators currently lack lightweight, practical tools that provide real-time oversight. This project introduces an open-source, containerized monitoring pipeline consisting of a sandboxed untrusted model, a probe manager for query routing and telemetry collection, a trusted filtering layer for rule-based and classifier-emulated severity scoring, and a frontend dashboard enabling operators to observe incidents, replay interactions, and evaluate model behavior. The system logs all interaction data into a Postgres-backed audit trail, enabling reproducibility and post-hoc analysis. The MVP demonstrates that defensive control layers can be built efficiently using modern, lightweight tooling and containerized isolation, making it feasible to deploy safety supervision even in resource-constrained or rapidly changing environments.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Trusted Model Supervisor
},
author={
Kalpesh Panchal
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


