APART RESEARCH

Impactful AI safety research

Explore our projects, publications and pilot experiments

Our Approach

Arrow

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Arrow

Safe AI

Publishing rigorous empirical work for safe AI: evaluations, interpretability and more

Novel Approaches

Our research is underpinned by novel approaches focused on neglected topics

Pilot Experiments

Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety

Our Approach

Arrow

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Arrow

Safe AI

Publishing rigorous empirical work for safe AI: evaluations, interpretability and more

Novel Approaches

Our research is underpinned by novel approaches focused on neglected topics

Pilot Experiments

Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety

Apart Sprint Pilot Experiments

Sep 15, 2025

When Guardrails Fail: Dual-Use Misuse of AI in Retrosynthesis Through Iterative Refinement–Induced Self-Jailbreaking

Artificial intelligence (AI) is transforming drug discovery, with large language models (LLMs) enabling rapid retrosynthesis planning. Yet these advances also pose dual-use risks, as adversarial prompting can redirect models toward generating harmful pathways. We evaluate Iterative Refinement Induced Self-Jailbreaking (IRIS), showing that while newer models resist more robustly, systems like GPT-4 can be induced to produce stepwise synthesis guidance. This underscores the fragility of guardrails and the urgency of continuous red-teaming. We argue that AI systems in drug discovery should be classified as high-risk under the EU AI Act and propose a severity-based governance framework to proportionately manage jailbreaks while safeguarding biomedical innovation.

Read More

Read More

Sep 15, 2025

Policy Brief: Harnessing Open-Source Intelligence for AI Risk Management

Artificial intelligence is advancing amid high uncertainty anddiverse risks, ranging from malicious uses such as AI-drivencyberattacks and CBRN threats to failures like hallucinations andsystemic impacts on labour markets and privacy. Yet only a smallfraction of global AI research addresses safety, creating anevidence dilemma in which regulators must act with limited data orrisk being overtaken by sudden capability leaps. Open-sourceintelligence (OSINT) platforms on AI risk offer a practical solutionby aggregating technical documentation, model benchmarks,incident reports, and safety practices to enable continuous,transparent, and shareable risk assessment. Integrating these toolsinto policymaking, regulatory oversight (e.g., EU AI Act), nationalsecurity planning, and public-sector innovation can enhancesituational awareness, strengthen compliance, foster public trust,and guide safe AI research and deployment.

Read More

Read More

Sep 15, 2025

CBRN-SAFE-Eval: Transparent Escalation Framework

How can we design a transparent, auditable framework for detectingand escalating CBRN-related risks in AI systems that balances real-time threat detection withstakeholder accountability requirements?

Read More

Read More

Sep 15, 2025

RobustCBRN Eval: A Practical Benchmark Robustification Toolkit

Current AI safety evaluations for CBRN risks contain systematicvulnerabilities, including statistical pattern exploitation, reproducibilitygaps, and transparency trade-offs, which can lead to seriousmisjudgments about model safety. RobustCBRN Eval addresses theseissues with a pipeline that integrates (1) Deep Ignorance consensusdetection across diverse models; (2) verified cloze scoring to reducemultiple-choice artifacts; and (3) statistical evaluation with bootstrapconfidence intervals for uncertainty quantification.In initial tests on WMDP benchmarks, the system revealed that modelaccuracy drops from ~66% to ~30% when question stems are removed,confirming that many items can be solved through superficial cues.Cloze-style scoring produced results consistent with full-formatquestions, and artifact filtering removed 30–40% of exploitable itemswhile reducing the longest-answer heuristic to under 30%. RobustCBRNEval runs 1,000–3,000 questions in under four hours for less than $300in compute, with variance under 2% across repeated runs.Key features include a resilient architecture that continues analysis underGPU failure, hash-based anonymization for reproducibility, andconfidence-aware evaluation that penalizes overconfident errors.Together, these results demonstrate that RobustCBRN Eval can identifybenchmark artifacts, improve robustness checks, and providereproducible, evidence-based safety evaluations of high-stakes AI models.

Read More

Read More

Sep 15, 2025

Towards Agnostic Viral Engineering Detection

Genetic engineering tools could potentially be misused to createharmful pathogens, making early detection of engineered virusescritical for biosecurity. We developed an agnostic geneticengineering detection system for viruses, simulating three types ofmodifications (deletions, inversions, and frameshift mutations)across 25 human-infecting viral genomes to create acomprehensive synthetic dataset. We benchmarked threeclassification approaches—k-mer-based logistic regression,BLAST alignment, and convolutional neural networks—alongsidean ensemble method. All models achieved F1-scores below 7%,suggesting that standard bioinformatics and machine learningapproaches are insufficient for robust detection of diverse viralengineering signatures.

Read More

Read More

Sep 15, 2025

ThoughtTrim

Large language models often generate long chain-of-thought (CoT) traces in which only a small subset of sentences materially influences the final answer. We propose ThoughtTrim: a simple evaluation framework that ranks CoT chunks by counterfactual-importance KL [1], reconstructs prompts using only the top-ranked chunks, and measures the accuracy-retention trade-off as filtering thresholds rise. Using Qwen2.5-1.5B on a 100-question Biology subset of MMLU-Pro, we find that (i) for some questions, KL-guided trimming preserves accuracy at substantial token savings (60-90% on many items), (ii) “first failure” thresholds are heterogeneous - some problems fail immediately while a long tail remains robust up to aggressive pruning, and (iii) a KL-shuffled control that preserves the number of kept chunks but breaks informativeness is consistently worse than the original selection, demonstrating the value of the ranking signal. We release a lightweight pipeline that utilizes the counterfactual-importance KL to understand the thresholds, efficiency frontiers, and failure distributions. This opens up future work in creating systems that are more deterministic, efficient, and robust through fine-tuning approaches leading to safer more deterministic approaches in agentic and LLM-based systems.

Read More

Read More

Sep 15, 2025

Arbiter: Automated Review of Bio-AI Tools for Emerging Risk

The rapid advancement of AI-enabled biological tools presents significantbiosecurity and AI safety challenges, necessitating scalable and balancedassessments. Building on the Global Risk Index (GRI) for AI-enabled BiologicalTools report's foundational framework, this work introduces Arbiter, anautomated pipeline designed to overcome the GRI's resource-intensive manualanalysis. Arbiter employs a multi-stage LLM-driven process to analyze scientificliterature, systematically evaluating AI-Bio tools for misuse risks and, notably,their potential benefits. This includes assessing economic impact, nationalcompetitiveness, and crisis response capabilities, providing policymakers with thecomprehensive, balanced insights required for informed decision-making. A pilotexecution demonstrated Arbiter's ability to efficiently monitor emerging tools,highlight areas for prompt and model refinement, and support the development offuture decision-support frameworks. Arbiter's modular and extensible designempowers users to tailor analyses, ensuring continuous, scalable, and adaptableoversight in this dynamic field.

Read More

Read More

Sep 15, 2025

Navigating Safety Measures for Nuclear Nonproliferation: AI-Enabled Early Warning Systems & Governance System For Detecting Nuclear Enrichment  

The Independent Atomic Energy Agency (IAEA) has always sought ways to create a balance between nuclearuse for peaceful purposes and the prevention of weapons proliferation. Yet challenges persist in the form ofclandestine uranium enrichment, covert plutonium processing and illicit trade in sensitive technologies.Recent advances in the integration of Artificial Intelligence into nuclear Command, Control andCommunication (NC3) systems and procedures could help reduce errors from being made in crisis scenarios,enhance situational awareness, improve surveillance and increase operational efficiency. However, theintroduction of AI systems would also amplify existing challenges and create room for adversarialmanipulation where proliferators intentionally feed misleading signals to evade detection systems and reducefacility footprints. This brief critically examines the vulnerabilities and limitations in existing safeguards,recent case studies such as Iran’s suspension of IAEA cooperation and a multimodal AI-enabled monitoringapproach for early detection of nuclear enrichment activities. It concludes with governance approaches toreduce nuclear related misuse and policy measures to align AI capabilities with non-proliferation norms.

Read More

Read More

Sep 15, 2025

Molecules Under Watch: Multi-Modal AI Driven Threat Emergence Detection for Biosecurity

This study presents a comprehensive multi-modal pipeline for assessing biosecurity risks in chemical compounds, integrating real and synthetic datasets from public repositories such as ChEMBL, PubChem, and USPTO patents. The system leverages molecular descriptors extracted via RDKit, contextual embeddings from the Qwen-2.5 language reasoning model, and unsupervised anomaly detection using Isolation Forest to compute a novel Threat Emergence Detection (TED) score. This score quantifies dual-use potential, synthesis feasibility, and novelty, enabling scalable threat triage. We evaluate the pipeline on hybrid datasets, demonstrating robust differentiation between real pharmaceutical compounds and synthetic benchmarks. Our approach advances AI-driven CBRN (Chemical, Biological, Radiological, Nuclear) safety by providing interpretable risk metrics and constitutional oversight, with implications for regulatory compliance and dual-use research governance.

Read More

Read More

Apart Sprint Pilot Experiments

Sep 15, 2025

When Guardrails Fail: Dual-Use Misuse of AI in Retrosynthesis Through Iterative Refinement–Induced Self-Jailbreaking

Artificial intelligence (AI) is transforming drug discovery, with large language models (LLMs) enabling rapid retrosynthesis planning. Yet these advances also pose dual-use risks, as adversarial prompting can redirect models toward generating harmful pathways. We evaluate Iterative Refinement Induced Self-Jailbreaking (IRIS), showing that while newer models resist more robustly, systems like GPT-4 can be induced to produce stepwise synthesis guidance. This underscores the fragility of guardrails and the urgency of continuous red-teaming. We argue that AI systems in drug discovery should be classified as high-risk under the EU AI Act and propose a severity-based governance framework to proportionately manage jailbreaks while safeguarding biomedical innovation.

Read More

Sep 15, 2025

Policy Brief: Harnessing Open-Source Intelligence for AI Risk Management

Artificial intelligence is advancing amid high uncertainty anddiverse risks, ranging from malicious uses such as AI-drivencyberattacks and CBRN threats to failures like hallucinations andsystemic impacts on labour markets and privacy. Yet only a smallfraction of global AI research addresses safety, creating anevidence dilemma in which regulators must act with limited data orrisk being overtaken by sudden capability leaps. Open-sourceintelligence (OSINT) platforms on AI risk offer a practical solutionby aggregating technical documentation, model benchmarks,incident reports, and safety practices to enable continuous,transparent, and shareable risk assessment. Integrating these toolsinto policymaking, regulatory oversight (e.g., EU AI Act), nationalsecurity planning, and public-sector innovation can enhancesituational awareness, strengthen compliance, foster public trust,and guide safe AI research and deployment.

Read More

Sep 15, 2025

CBRN-SAFE-Eval: Transparent Escalation Framework

How can we design a transparent, auditable framework for detectingand escalating CBRN-related risks in AI systems that balances real-time threat detection withstakeholder accountability requirements?

Read More

Sep 15, 2025

RobustCBRN Eval: A Practical Benchmark Robustification Toolkit

Current AI safety evaluations for CBRN risks contain systematicvulnerabilities, including statistical pattern exploitation, reproducibilitygaps, and transparency trade-offs, which can lead to seriousmisjudgments about model safety. RobustCBRN Eval addresses theseissues with a pipeline that integrates (1) Deep Ignorance consensusdetection across diverse models; (2) verified cloze scoring to reducemultiple-choice artifacts; and (3) statistical evaluation with bootstrapconfidence intervals for uncertainty quantification.In initial tests on WMDP benchmarks, the system revealed that modelaccuracy drops from ~66% to ~30% when question stems are removed,confirming that many items can be solved through superficial cues.Cloze-style scoring produced results consistent with full-formatquestions, and artifact filtering removed 30–40% of exploitable itemswhile reducing the longest-answer heuristic to under 30%. RobustCBRNEval runs 1,000–3,000 questions in under four hours for less than $300in compute, with variance under 2% across repeated runs.Key features include a resilient architecture that continues analysis underGPU failure, hash-based anonymization for reproducibility, andconfidence-aware evaluation that penalizes overconfident errors.Together, these results demonstrate that RobustCBRN Eval can identifybenchmark artifacts, improve robustness checks, and providereproducible, evidence-based safety evaluations of high-stakes AI models.

Read More

Sep 15, 2025

Towards Agnostic Viral Engineering Detection

Genetic engineering tools could potentially be misused to createharmful pathogens, making early detection of engineered virusescritical for biosecurity. We developed an agnostic geneticengineering detection system for viruses, simulating three types ofmodifications (deletions, inversions, and frameshift mutations)across 25 human-infecting viral genomes to create acomprehensive synthetic dataset. We benchmarked threeclassification approaches—k-mer-based logistic regression,BLAST alignment, and convolutional neural networks—alongsidean ensemble method. All models achieved F1-scores below 7%,suggesting that standard bioinformatics and machine learningapproaches are insufficient for robust detection of diverse viralengineering signatures.

Read More

Sep 15, 2025

ThoughtTrim

Large language models often generate long chain-of-thought (CoT) traces in which only a small subset of sentences materially influences the final answer. We propose ThoughtTrim: a simple evaluation framework that ranks CoT chunks by counterfactual-importance KL [1], reconstructs prompts using only the top-ranked chunks, and measures the accuracy-retention trade-off as filtering thresholds rise. Using Qwen2.5-1.5B on a 100-question Biology subset of MMLU-Pro, we find that (i) for some questions, KL-guided trimming preserves accuracy at substantial token savings (60-90% on many items), (ii) “first failure” thresholds are heterogeneous - some problems fail immediately while a long tail remains robust up to aggressive pruning, and (iii) a KL-shuffled control that preserves the number of kept chunks but breaks informativeness is consistently worse than the original selection, demonstrating the value of the ranking signal. We release a lightweight pipeline that utilizes the counterfactual-importance KL to understand the thresholds, efficiency frontiers, and failure distributions. This opens up future work in creating systems that are more deterministic, efficient, and robust through fine-tuning approaches leading to safer more deterministic approaches in agentic and LLM-based systems.

Read More