Apr 27, 2026
PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences
Nathan Khosla, Aleksandra Wosztyl
🏆 5th Place Winner + 🏆 Track 1 Winner: DNA Screening & Synthesis Controls
AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.
Functional/structural screening is very important and it seems you made real progress toward that, kudos!
I think using ricin as a central example makes sense given that it's a favourite among would-be-terrorists. However, I wonder whether there is some cherry-picking where certain proteins perform unusually well in your pipeline, maybe because they are well-represented in the dataset or similar - whereas your pipeline would struggle to detect functional/structural analogues in other proteins (?). This is just a guess, I am not an expert in this specific area.
It would also be interesting how this pipeline performs on viral proteins such as coronavirus family spike proteins or influenza H/N proteins. More infohazardous of course!
"a risk acknowledged by the October 2026 revision to the OSTP Nucleic Acid Synthesis Screening Framework, among other policy memos" --> either a typo for the year or LLM hallucination.
You mention the pipeline is bottlenecked by compute resources for use at scale. I'd be curious to see cost and speed comparisons with sequence homology screening.
This is asking too much for a hackathon submission but if this was a full article I would like an explanation of why exactly these values were chosen and what they actually represent: "DALI similarity scoring was
determined if a protein matched a toxin with a z score greater than 10, an identity score
greater than 30%, and/or an RMSD less than 2. MMseqs2 similarity was determined if the
identity score was greater than 90% than a known toxin sequence. FoldSeek similarity was
determined by an identity score greater than 30%, RMSD below 2, e-value below 1e-4, and/or
TM score greater than 0.9."
The Limitations and Future Work section makes lots of good points.
I would have appreciated a short discussion on next steps of how false positives and false negatives in your pipeline could be reduced.
IMHO your submission is a bit too biased toward arxiv-style academic writing. That's great for a very particular researcher audience but as a judge who is more in policy-world, it's a bit hard to follow. I think you could take inspiration from the writing of research outputs at places like METR or GovAI who do a great job at writing with rigorous clarity that is still accessible to non-experts. Hence slightly lower points for presentation/clarity but your technical contribution is fantastic.
The most interesting finding here (AI-designed Munc13 binders accidentally resembling neurotoxins with no toxin input) is buried at the end. Should probably lead with it.
24/24 catch rate via DALI is great but presentation is a weak spot: figure wrapping in related work breaks the read, and the 20-page appendix has no summary. This is honest about false positives.
An impressive amount of work covering a known gap in biosecurity. While this is similar work going on in this area, this is a nice piece of work that strings together several public tools into a robust demonstration and proof of principle. The paper is clearly written with a strong ToC and the table is nice and very information dense. A small legend on the second table in the appendix indicating what the colors meant would be nice to have, but minor.
For a weekend hackathon, the breadth of the testing dataset is very impressive. Testing against a panel that includes natural toxins, benign structural mimics, amyloid proteins, de novo ricin variants, and AI-designed Munc13 binders demonstrates a deep understanding of the problem space and a very rigorous examination of the screening pipeline.
The truncated de novo ricin sequences to simulate a partial-synthesis evasion attack was a strong inclusion. Proving that sequence alignment caught 0/24 of these while your DALI pipeline caught 24/24 is a great way to highlight the importance of moving beyond homology methods.
You do well in discussing the false positives and current limitations, flagging that host proteins co-crystalized with toxins get called, and you lay out future directions clearly and with a very strong understanding of the problem space and ongoing work.
The tiered design is clever. Using MMseqs2 for fast, low compute initial gating before moving to computationally expensive tools like AlphaFold2, FoldSeek, and DALI is practical and logical system design. But, and you touch on this, running AlphaFold2 and DALI on every single sequence that bypasses MMseqs2 is currently too computationally expensive for commercial scale adoption. But that may just be a problem that resolves as compute gets cheaper.
"Finally, trypsin (14), concanavalin A (19), and thrombin (22) were confidently flagged as toxic across all methods, raising a broader question that this and any biosecurity screening tool must explicitly address: what threshold of danger justifies synthesis restriction? The distinction between "toxic in some biological context" and "dangerous enough to warrant screening" is not currently defined in any of the databases used
here." - Excellent point. These aren't really biosecurity threats, and a smart synthesis screening system should not flag them, but I agree that is broader work and a bigger question than within this project and for a hackathon.
Future work could focus on a few areas. To solve the compute bottleneck, potentially inserting an intermediate machine learning step between MMseqs2 and AlphaFold2. As you mentioned in the report, using a lightweight classifier built on embeddings from a protein language model (like evo2 or ESM-2) could quickly filter out non-homologous benign proteins, reserving AlphaFold2 strictly for highly suspicious sequences. A good path toe xplore further.
Moving away from binary "toxic/non-toxic" outputs. Work developing a preliminary heuristic or risk score that considers the exact structural hit and flagging an active site match for botulinum neurotoxin as a 'critical flag/block' while flagging an overexpressed protease active site as 'requires manual/human review' mimicking some current screening approaches. As LLMs improve one could also think of sending it for a review step to one to then send their summary along to a human reviewer.
In future presentations, include a 3D structural overlay (e.g., in PyMOL) that shows how your de novo ricin aligns with the active site of natural ricin despite zero sequence homology. Visuals make structural bioinformatics much more accessible to policymakers, the public, and generalist judges etc.
Overall, really great work!
Cite this work
@misc {
title={
(HckPrj) PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences
},
author={
Nathan Khosla, Aleksandra Wosztyl
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


