Nov 22, 2025
CM-IDO Firewall: Context-Masked Iterative Defensive Optimization for Safer LLM Deployment
Sayash
My NeurIPS paper introduced context-masked meta-prompting:
a way to optimize prompts without exposing private data to external LLMs.
For this hackathon, I translated the same principle to AI safety.
I built CM-IDO: a context-masked iterative defensive optimization firewall.
Instead of optimizing prompts for accuracy, it optimizes them for safety,
using an internal evaluator that scores candidate defensive rewrites along
bio/cyber/disinfo axes and selects the safest one.
Sensitive content is masked, risk is quantified, and the task model only sees
the sanitized and safety-optimized rewrite.
This provides a drop-in, privacy-preserving, auditable safety layer
that can wrap any LLM API with no finetuning or architecture changes.
CM-IDO Firewall is a Context-Masked Iterative Defensive Optimization layer that sits in front of any LLM: it (1) classifies prompt risk, (2) masks sensitive entities, (3) iteratively rewrites the query into a safer, more defensive version, and (4) only then calls the underlying model — all while never logging raw user queries, only masked versions and hashes.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) CM-IDO Firewall: Context-Masked Iterative Defensive Optimization for Safer LLM Deployment
},
author={
Sayash
},
date={
11/22/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


