-

Online & In-Person

The Secure Program Synthesis Hackathon

The Secure Program Synthesis Hackathon brings researchers and engineers together for three days to prototype the tools we'll need to verify what AI is writing. Co-organized with Atlas Computing. Top teams are invited to apply to the four-month SPS Fellowship that follows.

00:00:00:00

Days To Go

306

Sign Ups

Overview

Resources

Guidelines

Schedule

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

306

Sign Ups

Overview

Resources

Guidelines

Schedule

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

306

Sign Ups

Overview

Resources

Guidelines

Schedule

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

306

Sign Ups

Overview

Resources

Guidelines

Schedule

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

Speakers & Collaborators

Jason Gross

Speaker

Jason Gross is a co-founder of Theorem, where he is building AI-driven formal verification to address the oversight gap in massively scaled software deployment. He spent his MIT PhD developing Fiat Cryptography, the verified cryptographic code that now underpins HTTPS for trillions of internet connections daily, and remains a core developer of the Rocq (formerly Coq) proof assistant. He holds a Ph.D. from MIT EECS in proof-assistant performance engineering and is based in the San Francisco Bay Area.

Rajashree Agrawal

Speaker

Rajashree Agrawal is the co-founder of Theorem, where she leads the ML research side of trustworthy-by-default AI coding via program equivalence. Her published work spans LLM jailbreaks (co-author on Many-shot Jailbreaking and on transferable image jailbreaks across vision-language models), mechanistic interpretability through compact proofs with Jason Gross, and synthetic-data scaling and model collapse. She also runs Monsoon Math Camp, a math education program in India, and is based in the San Francisco Bay Area.

Quinn Dougherty

Speaker and Organizer

Quinn Dougherty runs Forall R&D, a research consultancy helping AI safety orgs navigate the formal methods explosion, with contracts at Galois (on ARIA's Mathematics for Safe AI program) and the Beneficial AI Foundation. He co-built FVAPPS, a 4,715-problem benchmark for AI-assisted formal verification, and FV-Spec, which translates real-world property-based tests into Lean theorem-proving challenge problems. He writes the Guaranteed Safe AI newsletter and organized the 2024 Proof Scaling workshop at Lighthaven.

Joe Kiniry

Speaker

Dr. Joe Kiniry is the CEO and Chief Scientist of Sigil Logic, a Galois spinout, where he leads work on AI-driven formal methods for high-assurance systems. He spent the prior 12 years as Principal Scientist at Galois on hardware, firmware, and software correctness and security, and remains affiliated there. Before Galois he was a Full Professor at the Technical University of Denmark. He holds a Ph.D. from Caltech and is a Senior Member of both IEEE and ACM.

Kiran Gopinathan

Speaker

Kiran Gopinathan is a Research Scientist at Basis. Her research focuses on techniques for developing newer and better tools for automating formal verification - the art of using computers to automagically construct mathematical proofs about the correctness of software. Her research interests cover formal verification, program synthesis, type systems, language design and proof engineering. She previously completed her postdoc with Talia Ringer at UIUC, and before that earned her PhD in Programming Languages Research from the National University of Singapore on automating the maintenance of formally verified software.

Speakers & Collaborators

Jason Gross

Speaker

Jason Gross is a co-founder of Theorem, where he is building AI-driven formal verification to address the oversight gap in massively scaled software deployment. He spent his MIT PhD developing Fiat Cryptography, the verified cryptographic code that now underpins HTTPS for trillions of internet connections daily, and remains a core developer of the Rocq (formerly Coq) proof assistant. He holds a Ph.D. from MIT EECS in proof-assistant performance engineering and is based in the San Francisco Bay Area.

Rajashree Agrawal

Speaker

Rajashree Agrawal is the co-founder of Theorem, where she leads the ML research side of trustworthy-by-default AI coding via program equivalence. Her published work spans LLM jailbreaks (co-author on Many-shot Jailbreaking and on transferable image jailbreaks across vision-language models), mechanistic interpretability through compact proofs with Jason Gross, and synthetic-data scaling and model collapse. She also runs Monsoon Math Camp, a math education program in India, and is based in the San Francisco Bay Area.

Quinn Dougherty

Speaker and Organizer

Quinn Dougherty runs Forall R&D, a research consultancy helping AI safety orgs navigate the formal methods explosion, with contracts at Galois (on ARIA's Mathematics for Safe AI program) and the Beneficial AI Foundation. He co-built FVAPPS, a 4,715-problem benchmark for AI-assisted formal verification, and FV-Spec, which translates real-world property-based tests into Lean theorem-proving challenge problems. He writes the Guaranteed Safe AI newsletter and organized the 2024 Proof Scaling workshop at Lighthaven.

Joe Kiniry

Speaker

Dr. Joe Kiniry is the CEO and Chief Scientist of Sigil Logic, a Galois spinout, where he leads work on AI-driven formal methods for high-assurance systems. He spent the prior 12 years as Principal Scientist at Galois on hardware, firmware, and software correctness and security, and remains affiliated there. Before Galois he was a Full Professor at the Technical University of Denmark. He holds a Ph.D. from Caltech and is a Senior Member of both IEEE and ACM.

Kiran Gopinathan

Speaker

Kiran Gopinathan is a Research Scientist at Basis. Her research focuses on techniques for developing newer and better tools for automating formal verification - the art of using computers to automagically construct mathematical proofs about the correctness of software. Her research interests cover formal verification, program synthesis, type systems, language design and proof engineering. She previously completed her postdoc with Talia Ringer at UIUC, and before that earned her PhD in Programming Languages Research from the National University of Singapore on automating the maintenance of formally verified software.

Registered Local Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.