Apr 26, 2026
Benji-Bio
Ananya Ayasi
Benji-Bio is a stress-test harness for evaluating AI-biosecurity safety monitors under prompt transformations such as paraphrasing, role-shifting, ambiguity, and fictionalized misuse framing.
The project studies a specific evaluation failure mode: static or public benchmark prompts can overestimate safety when monitors learn obvious refusal patterns instead of robustly recognizing risky intent.
Benji-Bio is a small prototype benchmark for testing whether AI biosecurity monitors are robust to prompt transformations . This is potentially interesting if this can add automated, large-scale transformation generation, evaluate LLM-based monitors, and validate on a larger, expert-labeled dataset to demonstrate real robustness.
The framing is good- I would suggest expanding to LLM based monitors (even just claude or GPT as a classifier with a structured prompt) or simply reframe the paper's scope to keyword monitor evaluation. If the central claim is transformations matter, the main finding should be the variant accuracy degradation, not monitor ranking. The core idea is worth pursuing but the current implementation is still too thin.
Clear and thoughtful research, project, experiments, results, and presentation. A useful approach against the threat of prompt engineering to get around Biosecurity guardrails in large language models. In addition, the generated dataset can be helpful to others in developing solutions to this problem.
Cite this work
@misc {
title={
(HckPrj) Benji-Bio
},
author={
Ananya Ayasi
},
date={
4/26/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


