APART RESEARCH
Impactful AI safety research
Explore our projects, publications and pilot experiments
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Highlights

GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Highlights

GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Apart Sprint Pilot Experiments
May 5, 2026
SAEBER: Sparse Autoencoders for Biological Entity Risk
Open weight protein design models might generate toxic virulent proteins. Current classifiers are accurate but not interpretable or explainable. In this work, we train Sparse Autoencoders (SAEs) on RFD3 and RF3, leading open source protein folding and design models. We find SAE features with meaningful correlation to toxicity and virulence, with the top classifier reaching 0.87 AUROC.
Read More
May 12, 2026
BioWatch Brief: Rapid Pathogen Risk Assessment via Staged LLM Triage
Biosecurity analysts face a growing asymmetry: outbreak reporting volume has expanded substantially while the number of trained personnel able to synthesise that information in real time has not. BioWatch Brief compresses the analyst intake stage from hours to minutes via a three-stage LLM pipeline (structured extraction, retrieval against a curated corpus of historical outbreaks and biosecurity policy frameworks, and grounded analysis), producing a structured risk card from arbitrary input reports. The architecture deliberately separates fact extraction from retrieval and synthesis, constraining LLM outputs at each stage and surfacing uncertainty rather than masking it. Built on gpt-4.1-mini with a curated 21-entry open corpus (16 historical outbreaks, 5 policy/framework documents) normalised across pathogen, location, transmission, response history, lessons learned, and source URLs. React frontend, FastAPI backend, single /analyze_report endpoint. Built at the Apart AIxBio Hackathon, April 2026 (University of Pennsylvania).
Read More
May 4, 2026
Reagent Supply-Chain Structure for Benchtop DNA Synthesizers: There is Hope for KYC
Benchtop DNA synthesizers are approaching viral-genome-length assembly within 2-5 years. This primary-source review of nine vendors and eleven device families finds that every device crossing the >=1.5 kb threshold runs on a proprietary reagent ecosystem, potentially enabling low marginal cost KYC regulation. Submitter notes the project may be revised before public publication. Submitted via Discord DM after Framer Form closed.
Read More
May 4, 2026
Synthesis Tamper-evident Attestation and Molecular Provenance (STAMP): Cryptographic Molecular Barcoding for DNA Synthesizers
As AI systems become more capable at biological design and benchtop DNA synthesizers more affordable, the biothreat bottleneck shifts from design to physical synthesis, beyond the reach of centralized customer screening. We introduce STAMP (Synthesis Tamper-evident Attestation and Molecular Provenance), a 120-base barcode an HSM-equipped synthesizer stamps into a
non-coding region of every DNA it produces, attesting that a sequence originated from a registered, untampered synthesizer and was not significantly modified post-synthesis. STAMP combines cryptographic anchoring with a novel content-aware landmark map enabling forensic reconstruction of post-synthesis modifications. Empirically, the encoder achieves success across N = 2000 random plasmids and the privacy-preserving barcode landmark signal detects >=95% of kilobase-scale insertions. We do not claim to defeat attackers with jailbroken synthesizers; we prove this is irreducible. Instead, STAMP is a cost imposer and evidence generator: it converts every viable attack into a forensically suspicious artifact or a supply-chain-visible event.
Read More
May 4, 2026
Quantifying BLAST's Sensitivity Floor for DNA Synthesis Screening
Benchmarks BLAST as a per-fragment screener across seven fragment lengths (20-200 bp) and six mutation rates (0-20%) under pure evasion and dilute evasion threat models. Relevant to the 2024 OSTP Framework's 50 bp screening drop in Oct 2026.
Read More
May 4, 2026
Bio-Shield: Biorisk Triage Orchestrator (BTO)
Zero-Trust defense-in-depth architecture for DNA synthesis screening. BTO is a modular Managed Access Wrapper for biodesign tools and a Layer-2 inspector for synthesizers. Combines Overlap-Layout-Consensus assembly (fragmented hazards), sliding-window ESM-2 PLM scanning (AI-obfuscated chimeric toxins), and cyber-entropy checks for digital malware.
Read More
May 4, 2026
Biosecurity Export Control Navigator
Cross-jurisdictional regulatory comparison tool for dual-use items, surfacing where export control regimes diverge across jurisdictions to help researchers and compliance teams reason about biosecurity-relevant exports. Submitted via email after Framer Form closed (before announced AoE cutoff).
Read More
Apr 27, 2026
Function Over Sequence: Empirical Evaluation of Protein Language Models for Biosecurity Screening
DNA synthesis screening is a critical biosecurity chokepoint, but current tools detect dangerous sequences by similarity to known threats — a paradigm that collapses against AI-assisted protein design. Using ProteinMPNN and ESMFold, we generated 12,000 evasion variants of 50 known toxin proteins across 12 sampling temperatures, producing sequences with as little as 11% mean identity to known toxins. We evaluated ESM-C 600M protein language model embeddings as a function-aware alternative against BLASTp and commec baselines.
At T=1.5, BLASTp detects 3.5% of variants and commec detects 2.9%, while ESM-C kNN maintains 79.5%. Among variants structurally predicted to retain wild-type function, commec detects 0% and BLASTp detects 3.8%, while ESM-C kNN detects 100%. We additionally identify organism-matched negative construction as a necessary methodological requirement for honest evaluation in this space, showing that naive dataset construction inflates AUC by up to 0.016 and FPR by 58 percentage points.
Read More
Apr 27, 2026
Know Your Researcher Bio: Portable Authorization for AI-Bio Tools and Benchtop DNA Synthesizers
Sequence screening has matured faster than portable requester authorization. Whether a requester has been reviewed and remains authorized to make the request is still answered ad hoc at every provider, AI-bio tool, and equipment vendor. KYR-Bio is a local-first prototype of that missing layer: a reviewed, scoped, holder-bound researcher authorization that AI-bio tools, synthesis checkout flows, and benchtop DNA synthesizers can verify locally without re-sharing the applicant dossier. A single human-reviewed decision is packaged into a scoped, signed credential that a researcher's wallet presents to any participating relying party. Verifiers run schema, signature, issuer governance, Bitstring Status List freshness, holder proof and challenge binding, and scope checks before a policy adapter applies local rules. Audit events carry a hash chain and rolling Merkle root and exclude raw biological prompts, sequences, and reviewer notes. The synthetic evaluation passes 8 persona, 22 verifier, and 18 AI-assistance cases.
Read More
Apart Sprint Pilot Experiments
May 5, 2026
SAEBER: Sparse Autoencoders for Biological Entity Risk
Open weight protein design models might generate toxic virulent proteins. Current classifiers are accurate but not interpretable or explainable. In this work, we train Sparse Autoencoders (SAEs) on RFD3 and RF3, leading open source protein folding and design models. We find SAE features with meaningful correlation to toxicity and virulence, with the top classifier reaching 0.87 AUROC.
Read More
May 12, 2026
BioWatch Brief: Rapid Pathogen Risk Assessment via Staged LLM Triage
Biosecurity analysts face a growing asymmetry: outbreak reporting volume has expanded substantially while the number of trained personnel able to synthesise that information in real time has not. BioWatch Brief compresses the analyst intake stage from hours to minutes via a three-stage LLM pipeline (structured extraction, retrieval against a curated corpus of historical outbreaks and biosecurity policy frameworks, and grounded analysis), producing a structured risk card from arbitrary input reports. The architecture deliberately separates fact extraction from retrieval and synthesis, constraining LLM outputs at each stage and surfacing uncertainty rather than masking it. Built on gpt-4.1-mini with a curated 21-entry open corpus (16 historical outbreaks, 5 policy/framework documents) normalised across pathogen, location, transmission, response history, lessons learned, and source URLs. React frontend, FastAPI backend, single /analyze_report endpoint. Built at the Apart AIxBio Hackathon, April 2026 (University of Pennsylvania).
Read More
May 4, 2026
Reagent Supply-Chain Structure for Benchtop DNA Synthesizers: There is Hope for KYC
Benchtop DNA synthesizers are approaching viral-genome-length assembly within 2-5 years. This primary-source review of nine vendors and eleven device families finds that every device crossing the >=1.5 kb threshold runs on a proprietary reagent ecosystem, potentially enabling low marginal cost KYC regulation. Submitter notes the project may be revised before public publication. Submitted via Discord DM after Framer Form closed.
Read More
May 4, 2026
Synthesis Tamper-evident Attestation and Molecular Provenance (STAMP): Cryptographic Molecular Barcoding for DNA Synthesizers
As AI systems become more capable at biological design and benchtop DNA synthesizers more affordable, the biothreat bottleneck shifts from design to physical synthesis, beyond the reach of centralized customer screening. We introduce STAMP (Synthesis Tamper-evident Attestation and Molecular Provenance), a 120-base barcode an HSM-equipped synthesizer stamps into a
non-coding region of every DNA it produces, attesting that a sequence originated from a registered, untampered synthesizer and was not significantly modified post-synthesis. STAMP combines cryptographic anchoring with a novel content-aware landmark map enabling forensic reconstruction of post-synthesis modifications. Empirically, the encoder achieves success across N = 2000 random plasmids and the privacy-preserving barcode landmark signal detects >=95% of kilobase-scale insertions. We do not claim to defeat attackers with jailbroken synthesizers; we prove this is irreducible. Instead, STAMP is a cost imposer and evidence generator: it converts every viable attack into a forensically suspicious artifact or a supply-chain-visible event.
Read More
May 4, 2026
Quantifying BLAST's Sensitivity Floor for DNA Synthesis Screening
Benchmarks BLAST as a per-fragment screener across seven fragment lengths (20-200 bp) and six mutation rates (0-20%) under pure evasion and dilute evasion threat models. Relevant to the 2024 OSTP Framework's 50 bp screening drop in Oct 2026.
Read More
May 4, 2026
Bio-Shield: Biorisk Triage Orchestrator (BTO)
Zero-Trust defense-in-depth architecture for DNA synthesis screening. BTO is a modular Managed Access Wrapper for biodesign tools and a Layer-2 inspector for synthesizers. Combines Overlap-Layout-Consensus assembly (fragmented hazards), sliding-window ESM-2 PLM scanning (AI-obfuscated chimeric toxins), and cyber-entropy checks for digital malware.
Read More
Our Impact
Community
Dec 3, 2025
Explaining the Apart Research Fellowships
And introducing our brand new Partnered Fellowships
Read More


Research
Jul 25, 2025
Problem Areas in Physics and AI Safety
We outline five key problem areas in AI safety for the AI Safety x Physics hackathon.
Read More


Newsletter
Jul 11, 2025
Apart: Two Days Left of our Fundraiser!
Last call to be part of the community that contributed when it truly counted
Read More


Our Impact
Community
Dec 3, 2025
Explaining the Apart Research Fellowships
And introducing our brand new Partnered Fellowships
Read More


Research
Jul 25, 2025
Problem Areas in Physics and AI Safety
We outline five key problem areas in AI safety for the AI Safety x Physics hackathon.
Read More


Newsletter
Jul 11, 2025
Apart: Two Days Left of our Fundraiser!
Last call to be part of the community that contributed when it truly counted
Read More



Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events