Apr 27, 2026
Function Over Sequence: Empirical Evaluation of Protein Language Models for Biosecurity Screening
Saahir Dhanani
DNA synthesis screening is a critical biosecurity chokepoint, but current tools detect dangerous sequences by similarity to known threats — a paradigm that collapses against AI-assisted protein design. Using ProteinMPNN and ESMFold, we generated 12,000 evasion variants of 50 known toxin proteins across 12 sampling temperatures, producing sequences with as little as 11% mean identity to known toxins. We evaluated ESM-C 600M protein language model embeddings as a function-aware alternative against BLASTp and commec baselines.
At T=1.5, BLASTp detects 3.5% of variants and commec detects 2.9%, while ESM-C kNN maintains 79.5%. Among variants structurally predicted to retain wild-type function, commec detects 0% and BLASTp detects 3.8%, while ESM-C kNN detects 100%. We additionally identify organism-matched negative construction as a necessary methodological requirement for honest evaluation in this space, showing that naive dataset construction inflates AUC by up to 0.016 and FPR by 58 percentage points.
I love that you have taken the next step in answering this question. Your approach builds on past knowledge but produces important additional insights. I was also glad to see that you map out clearly how your work impact our knowledge based and how it impacts necessary future research. I think you answered an important question and have directly contributed to efforts to strengthen synthesis screening against AI-base circumvention. It is a shame that you did not / were unable to have tested this against the full commec installation. It would have further increased the saliency of your work and increased the knowledge generated
Cite this work
@misc {
title={
(HckPrj) Function Over Sequence: Empirical Evaluation of Protein Language Models for Biosecurity Screening
},
author={
Saahir Dhanani
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


