Apr 27, 2026
Beyond sequence similarity: Protein and DNA embeddings for evasion-resilient synthesis screening
Robert Amanfu
Biosecurity screening of synthesized or environmental DNA mostly asks: does this look like something on a known-toxin list? That fails when an attacker (or evolution) changes enough letters to escape the lookup while keeping the protein's harmful function. We test whether biological sequence models trained on proteins and DNA can spot that function directly. On 4,060 toxin and benign coding sequences, evaluated with close homologs (evolutionarily related sequences) kept out of training, an ensemble of a DNA model (Evo2 7B) and a protein model (ESM-2) catches 85.8% of toxins at a 1-in-100 false-alarm rate, versus 72% for Evo2 alone, 71% for ESM-2 alone, and 55% for a simple 5-letter-pattern baseline. By comparison, the open-source COMMEC policy screen (biorisk-only mode) flags only 16.8% of the same toxins, showing that learned models catch toxins the curated lookup databases miss. After mutating 60% of amino acids to disguise toxins, the protein screen still recovers 98%; the baseline drops to 41%. Reliable behavior requires DNA fragments ≥1,500 bp.
Incredibly interesting and valuable work. Anything that can be done to make screeners more robust - especially to evasion attempts - is a welcome addition to the screening toolset. This work is innovative and has direct implications for improving biosecurity. While the limitations are adequately spelled out in the paper, the clear next steps are to "red team" the approach against purposeful evasion / substitution attempts.
This seems like a reasonable set of results, but I had difficulty
figuring out whether AAs were run or just nucleotides. You do
talk about "BLAST's protein search" but then you keep talking
about DNA, and there are certainly systems that can detect
recoded sequences, e.g., they look at the peptide, not at the
DNA,, and it's unfortunate you didn't try this with any of those,
e.g., not just using a best-match system. It is certainly possible
to detect recoded sequences using these systems.
It's also unfortunate that you can't do this for < 1500bp seqs;
there's a lot of opportunity there for assembly-based mischief.
Nonetheless, figuring out these sorts of results is a worthwhile
endeavor.
A nit: "HHS Common Mechanism 2024" feels like Ai slop/hallucination.
This seems to be confusing HHS guidelines with the IBBIS Common Mechanism
implementation.
Also, in A.7: Couldn't someone else just train a model as you did?
Overall, this is a promising idea and a good start investigating it, with many thoughtful methodological elements. The presentation is admirably thorough.
The five-nucleotide pattern baseline is unmotivated, and seems like a straw man comparison.
The key weakness of the study is that, as far as I can tell, the benign and toxin portions of the train/test dataset have dramatically different length distributions. Unless I'm missing something, the model could simply be learning the rule "short = toxin". (Unless I misunderstand something, model embeddings could encode sequence length in some way.)
In what other ways might the benign and toxin distributions differ? How were benign examples selected? Relatedly, are these representative of the kinds of sequences sent to synthesis companies (dominated by synthetic and heavily engineered constructs)?
I'd like to see this study redone with two major changes: (1) a stronger method than five-nucleotide word frequency for a comparison baseline, and (2) a more carefully balanced and synthesis-representative dataset.
Cite this work
@misc {
title={
(HckPrj) Beyond sequence similarity: Protein and DNA embeddings for evasion-resilient synthesis screening
},
author={
Robert Amanfu
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


