APART RESEARCH
Impactful AI safety research
Explore our projects, publications and pilot experiments
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Highlights


GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Read More
Read More
Highlights

GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Read More
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
May 2, 2025
Evaluating the risk of job displacement by transformative AI automation in developing countries: A case study on Brazil
In this paper, we introduce an empirical and reproducible approach to monitoring job displacement by TAI. We first classify occupations based on current prompting behavior from a novel dataset from Anthropic, linking 4 million Claude Sonnet 3.7 prompts to tasks from the O*NET occupation taxonomy. We then develop a seasonally-adjusted autoregressive model based on employment flow data from Brazil (CAGED) between 2021 and 2024 to analyze the effects of diverging prompting behavior on employment trends per occupation. We conclude that there is no statistically significant difference in net-job dynamics between the occupations whose tasks feature the highest frequency in prompts and the ones with the lowest frequency, indicating that current AI technology has not initiated job displacement in Brazil.
Read More
Read More
May 2, 2025
The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence
We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.
Read More
Read More
Apr 28, 2025
The Rate of AI Adoption and Its Implications for Economic Growth and Disparities.
This project examines the economic impacts of AI adoption, focusing on its potential to increase productivity while also widening income inequality and regional disparities. It explores the factors influencing adoption rates across industries and concludes with policy recommendations aimed at mitigating these disparities through targeted AI adoption incentives and workforce upskilling programs.
Read More
Read More
Apr 28, 2025
Economics of TAI Sprint Submission: A Recursively-Inspired Framework and Simulation Results for Evaluating a Value-Based UBI Policy
The potential for Transformative AI (TAI) to exacerbate inequality necessitates exploring novel redistributive policies. This document presents a comprehensive framework and reports results from a conceptual simulation study evaluating the economic implications of a Universal Basic Income (UBI) funded by a significant annual tax (e.g., 4%) on company value. Drawing insights from recursive alignment (RA) and functional models of intelligence (FMI), this framework emphasizes policy adaptability and potential failure modes arising from rigid design under TAI's deep uncertainty. Developed for the Economics of TAI Sprint (Tracks 3 & 5), this document details the core research questions, analytical focal points, conceptual model structure, RA-inspired simulation design, and interprets the simulation outputs. It serves as a proof-of-concept and guide for future research into designing resilient economic policies for the TAI era.
Read More
Read More
Apr 28, 2025
US-1: Full AI Nationalization can cause Misaligned Economic Incentives
The escalating geostrategic importance of frontier AI development increases the likelihood of nationalization. While no explicit plans have emerged in the United States, such action would likely be swift and comprehensive. A government seizure of critical AI infrastructure would fundamentally transform the sector's economic foundation – shifting funding from traditional private sources to the American tax base, thereby repositioning AI as a public good. The objectives driving development would similarly pivot from user engagement to national security imperatives. Given the history of American adversaries pursuing intellectual property theft, this transition would likely establish a more restrictive diffusion model that prioritizes security over openness. By tightly controlling crucial elements of the AI stack, that approach risks diminishing the broader societal benefits that might otherwise emerge from AI advancement.
Read More
Read More
Apr 28, 2025
Economic Impact Analysis: The Impact of AI on the Indian IT Sector
We studied AI’s impact on India’s IT sector. We modelled a 20% labour shock and proposed upskilling and insurance policies to reduce AI-driven job losses.
Read More
Read More
Apr 28, 2025
jaime project Title
bbb
Read More
Read More
Apr 28, 2025
Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy
The economic transformations induced by artificial intelligence (AI) raise pressing distributional concerns. This paper examines the allocation of the AI Dividend—the surplus value generated through AI advancements—and proposes mechanisms to ensure equitable redistribution. Anchoring our analysis in the Distribution track, we focus on the Data as Labor (DaL) framework, inspired by Lanier and Weyl, wherein individuals' data contributions are treated as productive labor. We simulate and compare two paradigms: Data as Capital (DaC), in which data is aggregated as corporate capital, and DaL, wherein individuals are compensated for their data. Using comparative economic simulations, we highlight systemic inequalities emerging under DaC and demonstrate the stabilizing potential of DaL structures. We further subdivide the DaL paradigm into two mechanisms: a corporate taxation regime and an individualized data compensation model, proposing a novel formula for micro-level redistribution. The implications of these models for labor markets, inequality, and societal stability are discussed, with a focus on designing incentive-compatible and scalable economic policies. Our findings suggest that recognizing data as labor not only promotes distributive justice but also enhances the long-term sustainability of the AI-driven economy.Message @vi
Read More
Read More
Apr 29, 2025
Economic Feasibility of Universal High Income (UHI) in an Age of Advanced Automation
This paper analyzes five interlinked fiscal measures proposed to fund a Universal High Income (UHI) system in response to large-scale technological automation: a unity wealth tax, an unused land and property tax, progressive income tax reform, and the Artificial Intelligence Dividend Income (AIDI) program. Using dynamic general equilibrium modelling, IS-MP-PC frameworks, and empirical elasticity estimates, we assess the macroeconomic impacts, revenue potential, and distributional consequences of each measure. Results indicate that the combined measures could generate 8–12% of GDP in annual revenue, sufficient to sustainably support a UHI framework even with 80–90% unemployment. The wealth tax and land tax enhance fiscal resilience while reducing inequality; the progressive income tax improves administrative efficiency and boosts aggregate consumption; the AIDI channels the productivity gains of automation directly back to displaced workers and the broader public. Nonetheless, each policy presents limitations, including vulnerability to capital flight, political resistance, behavioural tax avoidance, innovation slowdowns, and enforcement complexity. AIDI, in particular, offers a novel mechanism to maintain consumer demand while moderating excessive automation, but demands careful regulatory oversight. Overall, the findings suggest that, if implemented carefully and globally coordinated, these measures provide a robust fiscal architecture to ensure equitable prosperity in a post-labour economy dominated by artificial intelligence. Strategic design and adaptive governance will be essential to maximize economic stability, technological innovation, and social welfare during this unprecedented economic transition.
Read More
Read More
Apart Sprint Pilot Experiments
May 2, 2025
Evaluating the risk of job displacement by transformative AI automation in developing countries: A case study on Brazil
In this paper, we introduce an empirical and reproducible approach to monitoring job displacement by TAI. We first classify occupations based on current prompting behavior from a novel dataset from Anthropic, linking 4 million Claude Sonnet 3.7 prompts to tasks from the O*NET occupation taxonomy. We then develop a seasonally-adjusted autoregressive model based on employment flow data from Brazil (CAGED) between 2021 and 2024 to analyze the effects of diverging prompting behavior on employment trends per occupation. We conclude that there is no statistically significant difference in net-job dynamics between the occupations whose tasks feature the highest frequency in prompts and the ones with the lowest frequency, indicating that current AI technology has not initiated job displacement in Brazil.
Read More
May 2, 2025
The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence
We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.
Read More
Apr 28, 2025
The Rate of AI Adoption and Its Implications for Economic Growth and Disparities.
This project examines the economic impacts of AI adoption, focusing on its potential to increase productivity while also widening income inequality and regional disparities. It explores the factors influencing adoption rates across industries and concludes with policy recommendations aimed at mitigating these disparities through targeted AI adoption incentives and workforce upskilling programs.
Read More
Apr 28, 2025
Economics of TAI Sprint Submission: A Recursively-Inspired Framework and Simulation Results for Evaluating a Value-Based UBI Policy
The potential for Transformative AI (TAI) to exacerbate inequality necessitates exploring novel redistributive policies. This document presents a comprehensive framework and reports results from a conceptual simulation study evaluating the economic implications of a Universal Basic Income (UBI) funded by a significant annual tax (e.g., 4%) on company value. Drawing insights from recursive alignment (RA) and functional models of intelligence (FMI), this framework emphasizes policy adaptability and potential failure modes arising from rigid design under TAI's deep uncertainty. Developed for the Economics of TAI Sprint (Tracks 3 & 5), this document details the core research questions, analytical focal points, conceptual model structure, RA-inspired simulation design, and interprets the simulation outputs. It serves as a proof-of-concept and guide for future research into designing resilient economic policies for the TAI era.
Read More
Apr 28, 2025
US-1: Full AI Nationalization can cause Misaligned Economic Incentives
The escalating geostrategic importance of frontier AI development increases the likelihood of nationalization. While no explicit plans have emerged in the United States, such action would likely be swift and comprehensive. A government seizure of critical AI infrastructure would fundamentally transform the sector's economic foundation – shifting funding from traditional private sources to the American tax base, thereby repositioning AI as a public good. The objectives driving development would similarly pivot from user engagement to national security imperatives. Given the history of American adversaries pursuing intellectual property theft, this transition would likely establish a more restrictive diffusion model that prioritizes security over openness. By tightly controlling crucial elements of the AI stack, that approach risks diminishing the broader societal benefits that might otherwise emerge from AI advancement.
Read More
Apr 28, 2025
Economic Impact Analysis: The Impact of AI on the Indian IT Sector
We studied AI’s impact on India’s IT sector. We modelled a 20% labour shock and proposed upskilling and insurance policies to reduce AI-driven job losses.
Read More
Apr 28, 2025
jaime project Title
bbb
Read More
Apr 28, 2025
Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy
The economic transformations induced by artificial intelligence (AI) raise pressing distributional concerns. This paper examines the allocation of the AI Dividend—the surplus value generated through AI advancements—and proposes mechanisms to ensure equitable redistribution. Anchoring our analysis in the Distribution track, we focus on the Data as Labor (DaL) framework, inspired by Lanier and Weyl, wherein individuals' data contributions are treated as productive labor. We simulate and compare two paradigms: Data as Capital (DaC), in which data is aggregated as corporate capital, and DaL, wherein individuals are compensated for their data. Using comparative economic simulations, we highlight systemic inequalities emerging under DaC and demonstrate the stabilizing potential of DaL structures. We further subdivide the DaL paradigm into two mechanisms: a corporate taxation regime and an individualized data compensation model, proposing a novel formula for micro-level redistribution. The implications of these models for labor markets, inequality, and societal stability are discussed, with a focus on designing incentive-compatible and scalable economic policies. Our findings suggest that recognizing data as labor not only promotes distributive justice but also enhances the long-term sustainability of the AI-driven economy.Message @vi
Read More
Apr 29, 2025
Economic Feasibility of Universal High Income (UHI) in an Age of Advanced Automation
This paper analyzes five interlinked fiscal measures proposed to fund a Universal High Income (UHI) system in response to large-scale technological automation: a unity wealth tax, an unused land and property tax, progressive income tax reform, and the Artificial Intelligence Dividend Income (AIDI) program. Using dynamic general equilibrium modelling, IS-MP-PC frameworks, and empirical elasticity estimates, we assess the macroeconomic impacts, revenue potential, and distributional consequences of each measure. Results indicate that the combined measures could generate 8–12% of GDP in annual revenue, sufficient to sustainably support a UHI framework even with 80–90% unemployment. The wealth tax and land tax enhance fiscal resilience while reducing inequality; the progressive income tax improves administrative efficiency and boosts aggregate consumption; the AIDI channels the productivity gains of automation directly back to displaced workers and the broader public. Nonetheless, each policy presents limitations, including vulnerability to capital flight, political resistance, behavioural tax avoidance, innovation slowdowns, and enforcement complexity. AIDI, in particular, offers a novel mechanism to maintain consumer demand while moderating excessive automation, but demands careful regulatory oversight. Overall, the findings suggest that, if implemented carefully and globally coordinated, these measures provide a robust fiscal architecture to ensure equitable prosperity in a post-labour economy dominated by artificial intelligence. Strategic design and adaptive governance will be essential to maximize economic stability, technological innovation, and social welfare during this unprecedented economic transition.
Read More
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Our Impact
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More



Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events