APART RESEARCH
Impactful AI safety research
Explore our projects, publications and pilot experiments
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Highlights


GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Read More
Read More
Highlights

GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Read More
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Apr 15, 2025
CalSandbox and the CLEAR AI Act: A State-Led Vision for Responsible AI Governance
The CLEAR AI Act proposes a balanced, forward-looking framework for AI governance in California that protects public safety without stifling innovation. Centered around CalSandbox, a supervised testbed for high-risk AI experimentation, and a Responsible AI Incentive Program, the Act empowers developers to build ethical systems through both regulatory support and positive reinforcement. Drawing from international models like the EU AI Act and sandbox initiatives in Singapore and the UK, CLEAR AI also integrates public transparency and equitable infrastructure access to ensure inclusive, accountable innovation. Together, these provisions position California as a global leader in safe, values-driven AI development.
Read More
Read More
Apr 15, 2025
California Law for Ethical and Accountable Regulation of Artificial Intelligence (CLEAR AI) Act
The CLEAR AI Act proposes a balanced, forward-looking framework for AI governance in California that protects public safety without stifling innovation. Centered around CalSandbox, a supervised testbed for high-risk AI experimentation, and a Responsible AI Incentive Program, the Act empowers developers to build ethical systems through both regulatory support and positive reinforcement. Drawing from international models like the EU AI Act and sandbox initiatives in Singapore and the UK, CLEAR AI also integrates public transparency and equitable infrastructure access to ensure inclusive, accountable innovation. Together, these provisions position California as a global leader in safe, values-driven AI development.
Read More
Read More
Apr 15, 2025
Recommendation to Establish the California AI Accountability and Redress Act
We recommend that California enact the California AI Accountability and Redress Act (CAARA) to address emerging harms from the widespread deployment of generative AI, particularly large language models (LLMs), in public and commercial systems.
This policy would create the California AI Incident and Risk Registry (CAIRR) under the California Department of Technology to transparently track post-deployment failures. CAIRR would classify model incidents by severity, triggering remedies when a system meets the criteria for an "Unacceptable Failure" (e.g., one Critical incident or three unresolved Moderate incidents).
Critically, CAARA also protects developers and deployers by:
Setting clear thresholds for liability;
Requiring users to prove demonstrable harm and document reasonable safeguards (e.g., disclosures, context-sensitive filtering); and
Establishing a safe harbor for those who self-report and remediate in good faith.
CAARA reduces unchecked harm, limits arbitrary liability, and affirms California’s leadership in AI accountability.
Read More
Read More
Apr 15, 2025
Round 1 Submission
Our round one submission for the UC Berkeley AI Policy Hackathon
Read More
Read More
Apr 14, 2025
FlexHEG Devices to Enable Implementation of AI IntelSat
The project proposes using Flexible Hardware Enabled Governance (FlexHEG) devices to support the IntelSat model for international AI governance. This framework aims to balance technological progress with responsible development by implementing tamper-evident hardware systems that enable reliable monitoring and verification between participating members. Two key policy approaches are outlined: 1) Treasury Department tax credits to incentivize companies to adopt FlexHEG-compliant hardware, and 2) NSF technical assistance grants to help smaller organizations implement these systems, preventing barriers to market entry while ensuring broad participation in the governance framework. The proposal builds on the successful Intelsat model (1964-2001) which balanced US leadership with international participation through weighted voting.
Read More
Read More
Apr 14, 2025
Smart Governance, Safer Innovation: A California AI Sandbox with Guardrails
California is home to global AI innovation, yet it also leads in regulatory experimentation with landmark bills like SB-1047 and SB-53. These laws establish a rigorous compliance regime for developers of powerful frontier models, including mandates for shutdown mechanisms, third-party audits, and whistle-blower protections. While crucial for public safety, these measures may unintentionally sideline academic researchers and startups who cannot meet such thresholds.
To reconcile innovation with public safety, we propose the California AI Sandbox: a public-private test-bed where vetted developers can trial AI systems under tailored regulatory conditions. The sandbox will be aligned with CalCompute infrastructure and enforce key safeguards including third-party audits and SB-53-inspired whistle-blower protection to ensure that flexibility does not come at the cost of safety or ethics.
Read More
Read More
Apr 14, 2025
FlexHEG Devices to Enable Implementation of AI IntelSat
The FlexHEG policy proposal combines Treasury Department tax credits and NSF technical assistance grants to create the hardware-enabled verification infrastructure necessary for implementing an IntelSat-like governance model for AI, ensuring both large corporations and smaller organizations can participate in a framework that balances technological progress with responsible development.
Read More
Read More
Apr 14, 2025
A Call for ‘Green AI’ in California: Evaluation of Existing and Alternative Regulations
The rapid growth of artificial intelligence, especially large-scale foundation models, poses critical environmental challenges. Training compute-heavy AI models consumes massive energy and water, leading to significant carbon emissions. California, a leader in climate and AI policy, doesn't have any legislation explicitly addressing the strain AI development is causing to the resources. To bridge these gaps, we propose bipartisan legislative solutions that ensure AI development in California is environmentally sustainable without stifling innovation.
Read More
Read More
Apr 14, 2025
Data Trusts in AI Governance
Data trusts are an alternative model for data governance that prioritise data subjects and creators in AI development.
Read More
Read More
Apart Sprint Pilot Experiments
Apr 15, 2025
CalSandbox and the CLEAR AI Act: A State-Led Vision for Responsible AI Governance
The CLEAR AI Act proposes a balanced, forward-looking framework for AI governance in California that protects public safety without stifling innovation. Centered around CalSandbox, a supervised testbed for high-risk AI experimentation, and a Responsible AI Incentive Program, the Act empowers developers to build ethical systems through both regulatory support and positive reinforcement. Drawing from international models like the EU AI Act and sandbox initiatives in Singapore and the UK, CLEAR AI also integrates public transparency and equitable infrastructure access to ensure inclusive, accountable innovation. Together, these provisions position California as a global leader in safe, values-driven AI development.
Read More
Apr 15, 2025
California Law for Ethical and Accountable Regulation of Artificial Intelligence (CLEAR AI) Act
The CLEAR AI Act proposes a balanced, forward-looking framework for AI governance in California that protects public safety without stifling innovation. Centered around CalSandbox, a supervised testbed for high-risk AI experimentation, and a Responsible AI Incentive Program, the Act empowers developers to build ethical systems through both regulatory support and positive reinforcement. Drawing from international models like the EU AI Act and sandbox initiatives in Singapore and the UK, CLEAR AI also integrates public transparency and equitable infrastructure access to ensure inclusive, accountable innovation. Together, these provisions position California as a global leader in safe, values-driven AI development.
Read More
Apr 15, 2025
Recommendation to Establish the California AI Accountability and Redress Act
We recommend that California enact the California AI Accountability and Redress Act (CAARA) to address emerging harms from the widespread deployment of generative AI, particularly large language models (LLMs), in public and commercial systems.
This policy would create the California AI Incident and Risk Registry (CAIRR) under the California Department of Technology to transparently track post-deployment failures. CAIRR would classify model incidents by severity, triggering remedies when a system meets the criteria for an "Unacceptable Failure" (e.g., one Critical incident or three unresolved Moderate incidents).
Critically, CAARA also protects developers and deployers by:
Setting clear thresholds for liability;
Requiring users to prove demonstrable harm and document reasonable safeguards (e.g., disclosures, context-sensitive filtering); and
Establishing a safe harbor for those who self-report and remediate in good faith.
CAARA reduces unchecked harm, limits arbitrary liability, and affirms California’s leadership in AI accountability.
Read More
Apr 15, 2025
Round 1 Submission
Our round one submission for the UC Berkeley AI Policy Hackathon
Read More
Apr 14, 2025
FlexHEG Devices to Enable Implementation of AI IntelSat
The project proposes using Flexible Hardware Enabled Governance (FlexHEG) devices to support the IntelSat model for international AI governance. This framework aims to balance technological progress with responsible development by implementing tamper-evident hardware systems that enable reliable monitoring and verification between participating members. Two key policy approaches are outlined: 1) Treasury Department tax credits to incentivize companies to adopt FlexHEG-compliant hardware, and 2) NSF technical assistance grants to help smaller organizations implement these systems, preventing barriers to market entry while ensuring broad participation in the governance framework. The proposal builds on the successful Intelsat model (1964-2001) which balanced US leadership with international participation through weighted voting.
Read More
Apr 14, 2025
Smart Governance, Safer Innovation: A California AI Sandbox with Guardrails
California is home to global AI innovation, yet it also leads in regulatory experimentation with landmark bills like SB-1047 and SB-53. These laws establish a rigorous compliance regime for developers of powerful frontier models, including mandates for shutdown mechanisms, third-party audits, and whistle-blower protections. While crucial for public safety, these measures may unintentionally sideline academic researchers and startups who cannot meet such thresholds.
To reconcile innovation with public safety, we propose the California AI Sandbox: a public-private test-bed where vetted developers can trial AI systems under tailored regulatory conditions. The sandbox will be aligned with CalCompute infrastructure and enforce key safeguards including third-party audits and SB-53-inspired whistle-blower protection to ensure that flexibility does not come at the cost of safety or ethics.
Read More
Apr 14, 2025
FlexHEG Devices to Enable Implementation of AI IntelSat
The FlexHEG policy proposal combines Treasury Department tax credits and NSF technical assistance grants to create the hardware-enabled verification infrastructure necessary for implementing an IntelSat-like governance model for AI, ensuring both large corporations and smaller organizations can participate in a framework that balances technological progress with responsible development.
Read More
Apr 14, 2025
A Call for ‘Green AI’ in California: Evaluation of Existing and Alternative Regulations
The rapid growth of artificial intelligence, especially large-scale foundation models, poses critical environmental challenges. Training compute-heavy AI models consumes massive energy and water, leading to significant carbon emissions. California, a leader in climate and AI policy, doesn't have any legislation explicitly addressing the strain AI development is causing to the resources. To bridge these gaps, we propose bipartisan legislative solutions that ensure AI development in California is environmentally sustainable without stifling innovation.
Read More
Apr 14, 2025
Data Trusts in AI Governance
Data trusts are an alternative model for data governance that prioritise data subjects and creators in AI development.
Read More
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Our Impact
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More



Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events