Four Paths to Failure: Red Teaming ASI Governance

Luca De Leo, Zoé Roy-Stang, Heramb Podar, Damin Curtis, Vishakha Agrawal, Ben Smyth

We stress‑tested A Narrow Path Phase 0—the proposed 20‑year moratorium on training artificial super‑intelligence (ASI)—during a one‑day red‑teaming hackathon. Drawing on rapid literature reviews, historical analogues (nuclear, bioweapon, cryptography, and export‑control regimes), and rough‑order cost modelling, we examined four cornerstone safeguards: datacentre compute caps, training‑licence thresholds, use‑ban triggers, and 12‑day breakout‑time detection logic.

Our analysis surfaced four realistic circumvention routes that an adversary could pursue while remaining nominally compliant:

Jurisdictional arbitrage via non‑signatory “AI‑haven” states.

Distributed consumer‑GPU swarms operating below per‑node licence limits.

Covert runs on classified state super­computers hidden behind national‑security carve‑outs.

Offshore proxy datacentres protected by host‑state sovereignty and inspection vetoes.

Each path could power GPT‑4‑scale training within 6–18 months, undermining the moratorium’s objectives.

To close these gaps, we propose ten mutually reinforcing amendments, including universal cryptographic “compute passports,” quarterly‑updating compute thresholds, an International AI Safety Commission with secondary‑sanctions powers, HSM‑bound weights, kill‑switch performance bonds, whistle‑blower bounties, whole‑network swarm detection, conditional infrastructure aid, and a 30‑month sunset‑and‑iteration clause. Implemented as a single package, these measures transform Phase 0 from a jurisdiction‑bounded freeze into a verifiable, adaptive global regime that blocks every breach path identified while permitting licensed low‑risk research.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Tolga Bilge

Great paper! I have reservations about adding sunset provisions.

Dave Kasten

First off, I appreciate the explicit choice to say " We deliberately favoured substance critiques—those that treat the

policies as flawlessly implemented—and set aside feasibility or implementation questions". This is the heart of the exercise! Thank you.

The discussion of regulatory flight is particularly thoughtful, especially with regards to the specific arguments about timeframes for moving to other countries and examples (we were already familiar with the Soviet bioweapons and Pakistani nuclear program examples and think they are some of the strongest bits of evidence against A Narrow Path, but thinking of FTX as an example of this is particularly inspired.)

I think the datacentre discussion is also good, though doesn't wrestle with the core question readers are left with: why aren't players in the space doing this already, and how quickly would the optimization pressure of our polices move the companies to overcome the (real but not insurmountable) technical hurdles. The compute passports idea in this context is intriguing, however. The discussion of other ways to identify sufficient criminal suspicion for a search (e.g., power usage or heat emissions, which is in line with how many jurisdictions detect e.g. unlawful marijuana grow houses today) is also thoughtful and better-articulated than most examples I've seen.

The offshore discussion is good, though it immediately made me think of examples of rendition during the US Global War on Terror that I wish you had brought in as well.

The 10 ideas are quite good as well; some of these mirror our proposals (e.g., the IASC inspections) but the specifics of how to make them work are just really sharp.

Overall, this hits the mark of giving me specific additional things that 1. I need to think about to find things to patch and 2. specific additional sentences that I can already use in discussions with policymakers to answer a "how might this work" question, driven by examples.

Cite this work

@misc {

title={

@misc {

},

author={

Luca De Leo, Zoé Roy-Stang, Heramb Podar, Damin Curtis, Vishakha Agrawal, Ben Smyth

},

date={

6/13/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

Mar 31, 2025

Model Models: Simulating a Trusted Monitor

We offer initial investigations into whether the untrusted model can 'simulate' the trusted monitor: is U able to successfully guess what suspicion score T will assign in the APPS setting? We also offer a clean, modular codebase which we hope can be used to streamline future research into this question.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Read More

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.