Apr 27, 2026
Toxscreen
Sissi Wang
Function‑Prediction Screening for Protein Hazards:
ToxScreen uses protein language model embeddings to detect hazardous proteins by predicted function rather than sequence similarity, catching AI-designed toxin variants that evade current DNA synthesis screening. Evaluated on 10,021 proteins with homology-controlled splits, it achieves 0.999 AUROC and 96.7% detection at 1% false positive rate on sequences sharing less than 40% identity with training data.
Thanks for this submisison! The empirical validation work has value. I think your report could benefit from succinctness, and fewer, higher impact charts & diagrams. I've also docked some points in the innovation department as I believe you have mostly built on top of existing tools and technology, but the tiny generalization gap finding is exciting.
Pitch whitepaper reads more like a research paper with results more than a hackathon proposal. Content and images seem decorative. Very high AUROC.
Solid initial exploration into using embeddings to discriminate harmful from non-harmful protein sequences, with a very visually pleasing report. The descriptions, motivations, and methods are clear and convincing. It would have been nice to see a comparison to a purely-sequence baseline (BLAST or similar) rather than or in addition to the average-property physiochemical baseline. The claims about generalization are compelling but not fully substantiated; the ESM based model seems not to be just memorizing sequences, but that doesn't mean it's learned the full notion of danger we care about; it may be learning more general characteristics unique to the training data. The report is very nicely arranged and readable, but would benefit from a review pass to remove AI-human collaboration seams.
Cite this work
@misc {
title={
(HckPrj) Toxscreen
},
author={
Sissi Wang
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


