Apr 26, 2026

Towards Hardware-Governed Benchtop DNA Synthesizers

Oraya Srimokla, James Petrie

Benchtop DNA synthesizers may soon enable bioweapon synthesis in individual labs without hardware-enforced controls. We propose a hardware design with three layers of defense: sequence screening, a regulator signature the device refuses to run without, and physical monitoring of the synthesis process. The first two reuse hardware primitives from AI chip governance. The third is novel, and addresses an attacker who submits a benign sequence and physically tampers with the device to produce a hazardous one instead.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

I found the problem and framing to be quite good in presentation. There was some jargon used from the hardware/AI/cyber-security side, but I think most could follow. I thought the connection to printer ink cartridge authentication was a clear example, and I think more analogues or exemplars could have been used in other proposal areas.

Since this was an evaluation and under a time-limit, the designs are understandably conceptual, and Figure 1 provides useful context. However, an additional diagram(s)/table(s) illustrating the broader threat landscape and potential failure points would have been nice.

One area that was neglected is how calibration, maintenance protocols, and service contracts would fall into this design (ex. how might calibration drift affect pipetting or volume-detection and then be adjusted). This would likely be an area where security could be firmed up, but it would be useful to identify where it may provide failure-points or access to bad-actors.

With the increasing accessibility of benchtop synthesizers, knowing how to mitigate the synthesis of sequences of concern is vital. This paper gives concrete recommendations for screening and authorization in a benchtop synthesizer and how to prevent tampering of the synthesizer. The recommendations against tampering were interesting and well worth consideration.

However, I found the recommendations for screening and the requirement for the regulator to pre-approve each sequence to be impractical. The vast majority of sequences are benign, and having them require approval would be a huge burden. Rather, sequences with high homology with sequences of concern should be pre-authorized only. There was also little data to validate the approaches. Recommendations on how the screening tool can be updated without tampering and how it would handle AI-generated oligos would have been useful as well.

This is a nice set of countermeasures, but there's an unfortunate

feasibility problem with the entire idea: Having personally advised

benchtop vendors on their machine security, it's apparent that even

spending the small amount of extra money it takes for a single-board

computer which allows adding a TPM chip (as opposed to ones which

don't even have a place on the board one may be attached) isn't

something vendors are going to do unless forced, such as via

legislation. Expecting them to do a good job about secure boot chains

and the like isn't likely in the near future absent some way to make

this a commercial priority, much less adding all kinds of hardware

to check that the machine is producing what it thinks it's producing.

It would be *very nice* if vendors really did this, but absent

incentives, the chances of vendors actually incorporating any of

these ideas seems near zero. (And it's not just the hardware;

doing security *right* takes expertise, which isn't the core

competency of a synthesizer producer, and hiring those people

also takes money. So the problem here is the incentives.)

As for the technical details:

(a) The guarantee processor seems to have all kinds of issues re

vulnerability to attacks, staleness, revelation of nonpublic hazards,

cost, etc, mostly because you're having it recapitulate the work of

screening a second time; this seems to make it large and with a large

attack surface. In particular, asking the GP to recompute the DOPRF

means asking it to be at least as powerful as the main CPU in the

benchtop. Given that you're citing SecureDNA's system here (which

was designed for benchtops), what you should probably do instead

is to take advantage of the "verified screening" mode, which

cryptographically signs over the results and a hash of the input

sequence. Then the GP need only check that signature, along with the

other state-of-the-machine verification it's already tasked to do.

(b) The ping time-of-flight isn't novel (not your problem but that of

the paper you cite); it's quite old. But the problem with it in the

case of benchtops is that you're at the mercy of the typically

terrible network infrastructure of random labs, which often have very

high latency and jitter (for all you know, the benchtop is on a

wireless network) and also tends to disenfranchise non-first-world

labs because their bandwidth is typically even worse. (It's even

worse if it's also competing with the network traffic from running the

SecureDNA protocol or of any other high-usage devices on that lab's

network.) Depending on ping times is a good way to randomly inhibit

legitimate synthesis due to poor infrastructure. ("AI chips" are

likely being run in a first-class data center, which is an *entirely*

different network environment than the typical university biolab.)

(c) Reagent verification is a nice idea, but again, expensive.

That's unfortunately the crippling flaw with most of the things

presented here, even if we wish is weren't so.

(d) Locking cartridges so they can't be rearranged also runs into the

same failure mode as use of crypto (and the DMCA's anti-circumvention

provisions) in the printer market: extreme vendor lock-in. This is

a problem for customers and not all courts have looked favorably on

the very concept, for precisely that reason. In the case of reagent

swaps in particular, the SecureDNA system is resilient to them: its

screening already accounts for that possibility and checks all 2! = 24

permutations simultaneously at zero additional computational overhead.

(This doesn't lead to an increase in false positives because DNA is

very non-random.)

Cite this work

@misc {

title={

(HckPrj) Towards Hardware-Governed Benchtop DNA Synthesizers

},

author={

Oraya Srimokla, James Petrie

},

date={

4/26/26

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

Apr 27, 2026

PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences

AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.

Read More

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.