Nov 2, 2025
Threat Snapshot
Arslan Akishev, Amir Kaiyrbek, Ali Kurman, Ansar Tleubayev, Madi Zhassymbek
We developed a real-time security evaluation system for open-source large language models (LLMs). Using a novel threat snapshot approach, the system isolates decision points where models may fail under adversarial prompts. It combines an adaptive multi-turn probing algorithm with a dual-inspector ensemble (Mistral-7B and Llama-2-7B) to detect vulnerabilities and reduce bias.
Tested on 100 newly released models, it achieved 0.89 danger detection accuracy, revealing that 34% of models appearing safe at first became unsafe under deeper probing. This framework enables continuous, automated monitoring of open-source AI models, bridging the gap between release and security assessment.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Threat Snapshot
},
author={
Arslan Akishev, Amir Kaiyrbek, Ali Kurman, Ansar Tleubayev, Madi Zhassymbek
},
date={
11/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


