Apr 26, 2026
Detecting BioRisks via Protein Structures with ESM2 based models for anomaly detection in Protein Sequences for Targeted Bodily Functions
Sanchayan Ghosh
We attempt to use ESM2 to generate potentially harmful pathogens by first connecting them via similarity to protein folding structures via the foundational model ESM2 and the protein database - to create a GNN.
Then we attempt to look for potential harmful sequences, and we build a detector that can detect such potentially harmful sequences by comparing against a known database of viral hosts / pathogens.
The approach proposed tries to go beyond detecting homology for sequences of concern through the primary sequence and implement a structural topology. In particular, the team attempts to detect sequences that bind to immune targets using the human immune interactome. There are several things in this paper that I think require work:
1) Methodological explanations: Currently, the methodology explanations are relatively thin
2) What sort of in silico methods were used to validate the function of the 'jailbreak' sequences
3) The short protein shown by the Red Team binds to many different proteins with differing functions. While this is possible, it is unlikely that it binds with a dissociation constant significant enough.
4) Case study of only one sequence
5) Other biodesign tools can already map out structural binding
The central premise of this project, that AI-designed proteins targeting key immune components constitute a biosecurity threat, is fundamentally flawed for two reasons. First, a peptide predicted to bind to an immune protein does not inherently disrupt its biological function; such interference occurs only if the peptide targets a specific functional interface. The D-Script model utilized in this pipeline lacks the granularity to predict site-specific binding. Second, even in cases of confirmed interaction disruption, the physiological result is more likely to be immunosuppression or anti-inflammatory activity rather than a catastrophic biosecurity risk.
This is an interesting approach; using pLM embeddings to check for genetically distant targets is a nice method that I always like to see improved. I am a little concerned by including an optimization loop for immune system binders that evade synthesis screening; bio is offense-favored, after all. But I also don't think the risk scoring against immune system target binding is ultimately very promising and lets you catch a wide variety of potential threats. But I would like to see more results on the detection efficiency, to become convinced otherwise.
Cite this work
@misc {
title={
(HckPrj) Detecting BioRisks via Protein Structures with ESM2 based models for anomaly detection in Protein Sequences for Targeted Bodily Functions
},
author={
Sanchayan Ghosh
},
date={
4/26/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


