Apr 27, 2026
Probing Risk Representations in Protein Language Models
Vishesh Gupta
Probing Risk Representations in Protein Language Models tests whether ESM-2 activations can provide a supplementary biosecurity signal for DNA synthesis screening. Current screening relies heavily on homology matching, which may miss AI-designed protein variants that retain dangerous function while diverging in sequence. I trained linear probes on ESM-2 representations from 1,264 labelled proteins across pathogen families and evaluated generalization on three fully withheld families. The main result is negative: global probes fail to generalize well and show no statistically significant advantage over a BLAST keyword baseline. The project also identifies two screening-relevant failure modes: high false positives on scrambled sequences and degraded detection on short fragments. Family-specific probes perform better in-distribution, suggesting a possible routed screening architecture combining homology tools with local representation probes.
Great topic/questions for a sprint. Nice results to continue exploring. Appreciate the use of ESM-2 and applied in this way.
I appreciated that info hazard concerns were taken into account by authors proactively. It's not an info hazard as such, but if the authors carry forward and publish, they could consider not including stats on certain high-sensitivity families (eg, pox) and stating as such.
rigorous negative results are actually valuable for the field!
I really like this work, because having negative results is *useful*
and you rarely see them presented.
Please never ever typeset an entire paper in italics. Literally
half of the submissions I've reviewed were entirely set in italics,
and I suspect it's because everyone was working from a submission
template which used italics to give instructions about what to say
in various sections, and you just inherited that formatting. But you
should be more vigilant; please don't make reviewers' eyeballs bleed. :)
A nit: There is no SecureDNA "consortium." This reads like it might
have been AI hallucination, or perhaps just a misunderstanding.
Cite this work
@misc {
title={
(HckPrj) Probing Risk Representations in Protein Language Models
},
author={
Vishesh Gupta
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


