Nov 24, 2025
LittleBrotherAI
Tarik Rosin, Rafi Hakim, Teodora Kamova, Manon Kempermann, Ruslan Dontsov
Large-language models (LLMs) enable powerful chatbots capable of chain-of-thought (CoT) reasoning and detailed responses. However, their reasoning often lacks transparency, making it difficult for users to judge reliability, detect hallucinations, or identify adversarial vulnerabilities [2,3,4]. In this work, we present a web-based chatbot platform “LittleBrotherAI” that integrates an external model with a comprehensive model evaluation dashboard. For every interaction, the system automatically assesses the model’s reasoning and response across seven key dimensions: consistency (language, semantics, and logical inference), clarity, reproducibility, legibility, coverage, adversarial behaviour, and accuracy. These evaluations are performed using dedicated auxiliary LLMs (Little brothers monitoring the main model) or specialised AI components[1]. The resulting scores are aggregated to determine a trustworthiness verdict for each answer, providing users with a clearer basis for judgment and trust calibration. Our platform demonstrates how transparent metrics and structured evaluation could enhance the responsible use of LLM-generated content. Our code can be found here: https://github.com/orgs/LittleBrotherAI/repositories.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) LittleBrotherAI
},
author={
Tarik Rosin, Rafi Hakim, Teodora Kamova, Manon Kempermann, Ruslan Dontsov
},
date={
11/24/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


