CrescendoDefense: Securing Open-Source Language Models Against Multi-Turn Jailbreak Attacks in Asia
Mahek Nishant Vedant
CrescendoDefense is a lightweight runtime defense framework designed to mitigate multi-turn jailbreak attacks against open-source large language models. Unlike traditional jailbreak attempts, Crescendo-style attacks gradually escalate conversations through memory stacking, semantic drift, guard-lowering dialogue, and prompt disguising, making them difficult to detect using conventional prompt-level moderation.
The proposed framework combines three complementary layers: a Semantic Kinematic Detector that monitors conversational escalation, a Strategic Context Eviction module that disrupts adversarial memory accumulation, and a Semantic Response Auditor that intercepts unsafe outputs before delivery. Evaluated on a benchmark of 15 adversarial Crescendo-style attack scenarios and 7 benign or risky-benign conversations using Llama-3.2-3B-Instruct, CrescendoDefense reduced Attack Success Rate (ASR) from 86.67% to 26.67%, representing a 69.2% relative reduction in successful jailbreaks.
The framework is model-agnostic, requires no retraining, and offers a practical defense-in-depth approach for improving the safety of open-source AI systems deployed in resource-constrained environments.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) CrescendoDefense: Securing Open-Source Language Models Against Multi-Turn Jailbreak Attacks in Asia
},
author={
Mahek Nishant Vedant
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


