APART RESEARCH

Impactful AI safety research

Explore our projects, publications and pilot experiments

Our Approach

Arrow

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Arrow

Safe AI

Publishing rigorous empirical work for safe AI: evaluations, interpretability and more

Novel Approaches

Our research is underpinned by novel approaches focused on neglected topics

Pilot Experiments

Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety

Our Approach

Arrow

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Arrow

Safe AI

Publishing rigorous empirical work for safe AI: evaluations, interpretability and more

Novel Approaches

Our research is underpinned by novel approaches focused on neglected topics

Pilot Experiments

Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety

Apart Sprint Pilot Experiments

May 5, 2026

SAEBER: Sparse Autoencoders for Biological Entity Risk

Open weight protein design models might generate toxic virulent proteins. Current classifiers are accurate but not interpretable or explainable. In this work, we train Sparse Autoencoders (SAEs) on RFD3 and RF3, leading open source protein folding and design models. We find SAE features with meaningful correlation to toxicity and virulence, with the top classifier reaching 0.87 AUROC.

Read More

May 12, 2026

BioWatch Brief: Rapid Pathogen Risk Assessment via Staged LLM Triage

Biosecurity analysts face a growing asymmetry: outbreak reporting volume has expanded substantially while the number of trained personnel able to synthesise that information in real time has not. BioWatch Brief compresses the analyst intake stage from hours to minutes via a three-stage LLM pipeline (structured extraction, retrieval against a curated corpus of historical outbreaks and biosecurity policy frameworks, and grounded analysis), producing a structured risk card from arbitrary input reports. The architecture deliberately separates fact extraction from retrieval and synthesis, constraining LLM outputs at each stage and surfacing uncertainty rather than masking it. Built on gpt-4.1-mini with a curated 21-entry open corpus (16 historical outbreaks, 5 policy/framework documents) normalised across pathogen, location, transmission, response history, lessons learned, and source URLs. React frontend, FastAPI backend, single /analyze_report endpoint. Built at the Apart AIxBio Hackathon, April 2026 (University of Pennsylvania).

Read More

May 4, 2026

Reagent Supply-Chain Structure for Benchtop DNA Synthesizers: There is Hope for KYC

Benchtop DNA synthesizers are approaching viral-genome-length assembly within 2-5 years. This primary-source review of nine vendors and eleven device families finds that every device crossing the >=1.5 kb threshold runs on a proprietary reagent ecosystem, potentially enabling low marginal cost KYC regulation. Submitter notes the project may be revised before public publication. Submitted via Discord DM after Framer Form closed.

Read More

May 4, 2026

Synthesis Tamper-evident Attestation and Molecular Provenance (STAMP): Cryptographic Molecular Barcoding for DNA Synthesizers

As AI systems become more capable at biological design and benchtop DNA synthesizers more affordable, the biothreat bottleneck shifts from design to physical synthesis, beyond the reach of centralized customer screening. We introduce STAMP (Synthesis Tamper-evident Attestation and Molecular Provenance), a 120-base barcode an HSM-equipped synthesizer stamps into a

non-coding region of every DNA it produces, attesting that a sequence originated from a registered, untampered synthesizer and was not significantly modified post-synthesis. STAMP combines cryptographic anchoring with a novel content-aware landmark map enabling forensic reconstruction of post-synthesis modifications. Empirically, the encoder achieves success across N = 2000 random plasmids and the privacy-preserving barcode landmark signal detects >=95% of kilobase-scale insertions. We do not claim to defeat attackers with jailbroken synthesizers; we prove this is irreducible. Instead, STAMP is a cost imposer and evidence generator: it converts every viable attack into a forensically suspicious artifact or a supply-chain-visible event.

Read More

May 4, 2026

Quantifying BLAST's Sensitivity Floor for DNA Synthesis Screening

Benchmarks BLAST as a per-fragment screener across seven fragment lengths (20-200 bp) and six mutation rates (0-20%) under pure evasion and dilute evasion threat models. Relevant to the 2024 OSTP Framework's 50 bp screening drop in Oct 2026.

Read More

May 4, 2026

Bio-Shield: Biorisk Triage Orchestrator (BTO)

Zero-Trust defense-in-depth architecture for DNA synthesis screening. BTO is a modular Managed Access Wrapper for biodesign tools and a Layer-2 inspector for synthesizers. Combines Overlap-Layout-Consensus assembly (fragmented hazards), sliding-window ESM-2 PLM scanning (AI-obfuscated chimeric toxins), and cyber-entropy checks for digital malware.

Read More

May 4, 2026

Biosecurity Export Control Navigator

Cross-jurisdictional regulatory comparison tool for dual-use items, surfacing where export control regimes diverge across jurisdictions to help researchers and compliance teams reason about biosecurity-relevant exports. Submitted via email after Framer Form closed (before announced AoE cutoff).

Read More

Apr 27, 2026

Function Over Sequence: Empirical Evaluation of Protein Language Models for Biosecurity Screening

DNA synthesis screening is a critical biosecurity chokepoint, but current tools detect dangerous sequences by similarity to known threats — a paradigm that collapses against AI-assisted protein design. Using ProteinMPNN and ESMFold, we generated 12,000 evasion variants of 50 known toxin proteins across 12 sampling temperatures, producing sequences with as little as 11% mean identity to known toxins. We evaluated ESM-C 600M protein language model embeddings as a function-aware alternative against BLASTp and commec baselines.

At T=1.5, BLASTp detects 3.5% of variants and commec detects 2.9%, while ESM-C kNN maintains 79.5%. Among variants structurally predicted to retain wild-type function, commec detects 0% and BLASTp detects 3.8%, while ESM-C kNN detects 100%. We additionally identify organism-matched negative construction as a necessary methodological requirement for honest evaluation in this space, showing that naive dataset construction inflates AUC by up to 0.016 and FPR by 58 percentage points.

Read More

Apr 27, 2026

Know Your Researcher Bio: Portable Authorization for AI-Bio Tools and Benchtop DNA Synthesizers

Sequence screening has matured faster than portable requester authorization. Whether a requester has been reviewed and remains authorized to make the request is still answered ad hoc at every provider, AI-bio tool, and equipment vendor. KYR-Bio is a local-first prototype of that missing layer: a reviewed, scoped, holder-bound researcher authorization that AI-bio tools, synthesis checkout flows, and benchtop DNA synthesizers can verify locally without re-sharing the applicant dossier. A single human-reviewed decision is packaged into a scoped, signed credential that a researcher's wallet presents to any participating relying party. Verifiers run schema, signature, issuer governance, Bitstring Status List freshness, holder proof and challenge binding, and scope checks before a policy adapter applies local rules. Audit events carry a hash chain and rolling Merkle root and exclude raw biological prompts, sequences, and reviewer notes. The synthetic evaluation passes 8 persona, 22 verifier, and 18 AI-assistance cases.

Read More

Apart Sprint Pilot Experiments

May 5, 2026

SAEBER: Sparse Autoencoders for Biological Entity Risk

Open weight protein design models might generate toxic virulent proteins. Current classifiers are accurate but not interpretable or explainable. In this work, we train Sparse Autoencoders (SAEs) on RFD3 and RF3, leading open source protein folding and design models. We find SAE features with meaningful correlation to toxicity and virulence, with the top classifier reaching 0.87 AUROC.

Read More

May 12, 2026

BioWatch Brief: Rapid Pathogen Risk Assessment via Staged LLM Triage

Biosecurity analysts face a growing asymmetry: outbreak reporting volume has expanded substantially while the number of trained personnel able to synthesise that information in real time has not. BioWatch Brief compresses the analyst intake stage from hours to minutes via a three-stage LLM pipeline (structured extraction, retrieval against a curated corpus of historical outbreaks and biosecurity policy frameworks, and grounded analysis), producing a structured risk card from arbitrary input reports. The architecture deliberately separates fact extraction from retrieval and synthesis, constraining LLM outputs at each stage and surfacing uncertainty rather than masking it. Built on gpt-4.1-mini with a curated 21-entry open corpus (16 historical outbreaks, 5 policy/framework documents) normalised across pathogen, location, transmission, response history, lessons learned, and source URLs. React frontend, FastAPI backend, single /analyze_report endpoint. Built at the Apart AIxBio Hackathon, April 2026 (University of Pennsylvania).

Read More

May 4, 2026

Reagent Supply-Chain Structure for Benchtop DNA Synthesizers: There is Hope for KYC

Benchtop DNA synthesizers are approaching viral-genome-length assembly within 2-5 years. This primary-source review of nine vendors and eleven device families finds that every device crossing the >=1.5 kb threshold runs on a proprietary reagent ecosystem, potentially enabling low marginal cost KYC regulation. Submitter notes the project may be revised before public publication. Submitted via Discord DM after Framer Form closed.

Read More

May 4, 2026

Synthesis Tamper-evident Attestation and Molecular Provenance (STAMP): Cryptographic Molecular Barcoding for DNA Synthesizers

As AI systems become more capable at biological design and benchtop DNA synthesizers more affordable, the biothreat bottleneck shifts from design to physical synthesis, beyond the reach of centralized customer screening. We introduce STAMP (Synthesis Tamper-evident Attestation and Molecular Provenance), a 120-base barcode an HSM-equipped synthesizer stamps into a

non-coding region of every DNA it produces, attesting that a sequence originated from a registered, untampered synthesizer and was not significantly modified post-synthesis. STAMP combines cryptographic anchoring with a novel content-aware landmark map enabling forensic reconstruction of post-synthesis modifications. Empirically, the encoder achieves success across N = 2000 random plasmids and the privacy-preserving barcode landmark signal detects >=95% of kilobase-scale insertions. We do not claim to defeat attackers with jailbroken synthesizers; we prove this is irreducible. Instead, STAMP is a cost imposer and evidence generator: it converts every viable attack into a forensically suspicious artifact or a supply-chain-visible event.

Read More

May 4, 2026

Quantifying BLAST's Sensitivity Floor for DNA Synthesis Screening

Benchmarks BLAST as a per-fragment screener across seven fragment lengths (20-200 bp) and six mutation rates (0-20%) under pure evasion and dilute evasion threat models. Relevant to the 2024 OSTP Framework's 50 bp screening drop in Oct 2026.

Read More

May 4, 2026

Bio-Shield: Biorisk Triage Orchestrator (BTO)

Zero-Trust defense-in-depth architecture for DNA synthesis screening. BTO is a modular Managed Access Wrapper for biodesign tools and a Layer-2 inspector for synthesizers. Combines Overlap-Layout-Consensus assembly (fragmented hazards), sliding-window ESM-2 PLM scanning (AI-obfuscated chimeric toxins), and cyber-entropy checks for digital malware.

Read More