Jan 11, 2026
RiskLab
Yuvan Aarya Chikka
A comprehensive, production-ready system for measuring, analyzing, and governing manipulative behavior in AI systems. Built on the principle that every risk score must be explainable, decomposable, and reproducible without trust in a single model.
The idea of using multiple evaluator signals to assess risk is well motivated, and the evaluation setup is clearly described. The framework works as intended as a proof of concept.
However, the experiments rely on a small number of prompts and relatively mild framing changes, which makes it difficult to tell whether the observed score differences reflect meaningful risk signals or general prompt sensitivity. While multiple evaluator outputs are collected, there is limited analysis of where evaluators disagree or which signals are most informative.
The project would benefit from stronger framing contrasts and more focused analysis of evaluator disagreement. Overall, this is a solid prototype that demonstrates the core idea clearly and could be strengthened with deeper analysis.
Cite this work
@misc {
title={
(HckPrj) RiskLab
},
author={
Yuvan Aarya Chikka
},
date={
1/11/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


