Apr 27, 2026
Filtering and Flagging
Keya Aggarwal, Vivien Chen, arya datla, Jaywu Jun
Benchtop DNA synthesizers have democratized sequence generation, bypassing the traditional biosecurity chokepoint of centralized synthesis facilities; however, manufacturing synthesis hardware remains prohibitively difficult for isolated bad actors, so regulating the software on such machines, made by a few companies, remains a promising avenue towards security. We present SynthShield, a pre-screening and logging software that leverages ESM-2 protein language model embeddings to prevent bad actors from synthesizing such malicious sequences. Presently, screening solutions (BLAST, etc.) only compare sequence similarity, missing functionally similar but physically distinct dangerous sequences, a growing concern in the new AI-assisted research regime that allows small teams of bad actors to iterate faster and with greater available attack surface. Furthermore, current software has little public or even governmental observability for synthesized sequences, as protecting researchers' IP remains an important challenge. To tackle this, we pair our screening software with a tamper-evident black box and a public blockchain (Ethereum L2) to record sequence-aware hashes of synthesized DNA, which allows post-hoc identification of bad actors by law enforcement, even for CRISPR-altered bio-weapons that were indirectly created with the help of these synthesizers, without revealing critical information about the precise sequence. In this paper, we evaluate our integrated pipeline and show that the ESM-2 screener achieves AUC 0.977 on a remote homology test set where the BLAST baseline achieves only AUC 0.711, an improvement of 0.266 that directly addresses the AI evasion scenario.
The authors developed an ESM 2 screening protocol to detect AI-generated synthetic homologs and added a logging mechanism to their pipeline, as well as an assembly method for split orders. The end-to-end pipeline is multilayered and well executed, even if some version of each step in the process has been previously implemented. However, the study does have limitations, many of which the authors address at the end of the paper. In addition to what was mentioned in the paper, more limitations or suggestions would be:
1) Generating synthetic homolog test sets using other biodesign tools. Currently, ESM2 is mainly designing the training and test sets. I would be curious if the test set was generated with other tools and how that would affect the metrics.
2) Low amount of sequences was acknowledged, and moreover, different functional classes need to be tested with more complex mechanisms and proteins. Using the authors' method, this would require training with a very large dataset, but it may be possible to find efficiencies in a pipeline that would not require as much compute.
3) Using BLAST as a metric. I would have used the free screening tool as a basis of comparison rather than just BLAST
This felt like 3 papers stitched together. the ESM-2 vs BLAST piece is solid and would stand on its own. the blockchain + audit log + 5-attack-class stuff is described but not really tested, which made me trust the rest less. would've been stronger at half the length focused on ESM.
Cite this work
@misc {
title={
(HckPrj) Filtering and Flagging
},
author={
Keya Aggarwal, Vivien Chen, arya datla, Jaywu Jun
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


