May 24, 2026
Axiom Zero: AlphaZero-Style Reinforcement Learning for Automated Formal Verification of Python Programs
Mufaro Rukuni, Brain Monzora
The rapid growth of AI-generated code has created a verification crisis: Gross World
Lines of Code (LoC) is expanding at unprecedented rates, yet developers have no systematic
guarantee that the code produced by large language models behaves as intended. We present
Axiom Zero, a compiler and reinforcement learning system that translates Python/PyTorch
source code into formal proofs verified by the Lean 4 proof assistant, using an AlphaZero-style
agent to discover those proofs automatically without human-labelled examples. The system
comprises four implemented phases: (1) a parse pipeline that converts Python source into
a normalized intermediate representation with type and tensor-shape analysis; (2) a proof
environment modelling theorem proving as a two-player game with 39 curated tactics across
10 categories; (3) a policy-plus-value network with MCTS tree search trained via self-play;
and (4) a Python-to-Lean 4 compiler with difficulty-stratified proof-hole filling. All 143 unit
tests pass across 3,450 lines of dependency-free Python. The Lean 4 kernel serves as the
binary oracle: a proof either compiles or it does not, providing a clean reward signal that
eliminates the need for human annotation. Axiom Zero demonstrates that the game-theoretic
self-play paradigm, previously applied to chess and Go, can be transferred to the domain
of program verification, offering a scalable path toward trustworthy AI-assisted software
development.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Axiom Zero: AlphaZero-Style Reinforcement Learning for Automated Formal Verification of Python Programs
},
author={
Mufaro Rukuni, Brain Monzora
},
date={
5/24/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


