Mar 23, 2026
RAG Faithfulness evaluator
Mouad Benmansour
RAG is everywhere now, and organisations adopt it precisely because it is supposed to be accurate. This tool holds RAG systems accountable to that promise, monitoring every output at sentence level, visualising how faithfulness drifts across an answer, and producing a named diagnosis explaining what went wrong, where, and what to fix. Built on a hybrid of semantic similarity and lexical overlap, with an LLM diagnostic layer constrained to a human-authored taxonomy of six known drift patterns. It catches what aggregate scores miss and explains what they cannot
For a hackathon build, this is a clean working prototype. Sentence-level faithfulness scoring with drift timelines beats aggregate numbers — operators get something actionable, not just a pass/fail. The six-pattern diagnostic taxonomy is practical and the ACL injury catch (high semantic similarity but low lexical overlap revealing a training data leak) is a nice demo of why two scoring signals beat one.
The AI control connection could be tighter — this reads more as a RAG reliability tool than a control protocol for adversarial subversion. Framing it as "what if the model is deliberately trying to smuggle ungrounded claims past the monitor" would have landed better for this hackathon's theme.
Evaluation is one domain with hand-tuned thresholds, which is fine for a prototype — the roadmap (HaluEval, entailment scoring) shows the author knows what's next. Good execution for the time constraint.
The problem of sentence-level faithfulness scoring in RAG systems is practically relevant, and the drift timeline idea is an intuitive framing. The diagnostic taxonomy of six failure patterns is a reasonable starting point for categorising how RAG systems fail.
The main concern is fit with the AI control hackathon theme. This addresses ordinary retrieval tooling failures rather than the adversarial monitoring setting of AI control.
Cite this work
@misc {
title={
(HckPrj) RAG Faithfulness evaluator
},
author={
Mouad Benmansour
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


