-

Online & In-Person

The Secure Program Synthesis Hackathon

The Secure Program Synthesis Hackathon brings researchers and engineers together for three days to prototype the tools we'll need to verify what AI is writing. Co-organized with Atlas Computing. Top teams are invited to apply to the four-month SPS Fellowship that follows.

00:00:00:00

Days To Go

324

Sign Ups

61

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

324

Sign Ups

61

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

324

Sign Ups

61

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

324

Sign Ups

61

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

Gross World LoC is skyrocketing because of AI. By default, we have no way of knowing if what we're vibecoding does what we think it does. Secure program synthesis throws formal methods, proof assistants, type systems, model checkers, at all the code coming out of AI systems. Three days. Small teams. Real research output.

Top teams from this hackathon are invited to apply to the Secure Program Synthesis Fellowship (June-October 2026), where mentor-led teams take projects further with Apart Research project managers, compute, and API credits.

Organized by Apart Research and Atlas Computing

Top teams get

Invitation to the Secure Program Synthesis Fellowship

$2,000 in cash prizes across all tracks

🥇 1st Place

$1,000

🥈 2nd Place

$500

🥉 3rd Place

$300

🏅 4th Place

$100

🏅 5th Place

$100

What this hackathon is about

Three days to build a research artifact at the intersection of AI, formal methods, and security. The bottleneck in trustworthy software is no longer writing code, it's specifying what the code should actually do and verifying it does. We want sharp prototypes that move that needle.

The hackathon runs alongside the Secure Program Synthesis Fellowship. Strong teams here get invited to apply to the four-month Fellowship that starts in June, where you continue the work with a mentor, an Apart project manager, compute, and API credits.

Tracks

Same four focus areas as the Fellowship. Pick one and ship something concrete by Sunday.

1. Specification Elicitation

Tools that pull formal specifications out of ambiguous sources: documentation, legacy code, requirements docs, conversations with domain experts. Structured editors, GUIs, pipelines that translate informal intent into Lean (or similar).

Example projects:

  • A Coq/Lean spec drafting assistant that turns a natural-language requirement into a candidate spec, with reviewer tooling for the human in the loop.

  • An IDE extension that surfaces hidden assumptions in a legacy C codebase.

2. Specification Validation

Methods that check whether a candidate specification actually captures the system's intended behavior. Testing, cross-checking, mutation, formal validation.

Example projects:

  • Property-based fuzzing harness that flags specs which underconstrain or overconstrain the system.

  • Cross-model spec comparison: generate two spec candidates from different LLMs, surface where they disagree.

3. Spec-Driven Development & Evaluation (a.k.a. Vericoding)

Workflows where a spec generates multiple candidate implementations and ranks them. CEGIS-style loops, spec-conditioned codegen, automatic discrimination between implementations. This is sometimes called vericoding: formally verified program synthesis from specs.

Example projects:

  • An agent that generates N implementations from a spec and uses the spec as the evaluator.

  • Lean-backed test harness that catches semantic regressions LLM coders introduce.

4. Adversarial Robustness for ITPs and Proof Tools

Find the failure modes in interactive theorem provers (Lean, Coq, Isabelle, F*) and the LLM-assisted tools that build on them. The new frontier of adversarial robustness for ITPs: fuzzing kernels, defeating proof autocompleters, exploiting unsound automation. Not classical ML adversarial examples.

Example projects:

  • Adversarial inputs that defeat a state-of-the-art proof autocompleter.

  • Red-team study of an "AI as proof reviewer" pipeline, looking for whether the AI rubber-stamps wrong proofs.

  • Fuzz a Lean-verified library for runtime-level soundness gaps, in the style of Kiran Gopinathan's zlib bug (see Resources tab).

Who should join

Generalist software engineers with one or two deeper areas in any of: proof engineering (verified software preferred, but ITP math proofs work too), redteaming and fuzzing, SMT and model checking, secure systems design, or agent / ML evals work. If your expertise sits adjacent to this and you can spin up quickly, that's also a fit.

No formal methods background required to join, but expect to lean on teammates who have it.

What you will do

  • Friday May 22: Kickoff, track briefings, team formation.

  • Saturday May 23: Build.

  • Sunday May 24: Final pushes, submissions, demo.

What happens after

Top teams are invited to apply to the Secure Program Synthesis Fellowship, June-October 2026. The Fellowship pairs senior researchers (e.g. Erik Meijer, Mike Dodds) with small teams to take projects from prototype to research artifact, with compute, API credits, and demo-day travel funding.

Contact

Questions: secure-program-synthesis-fellowship@apartresearch.com

Speakers & Collaborators

Quinn Dougherty

Speaker and Organizer

Quinn Dougherty runs Forall R&D, a research consultancy helping AI safety orgs navigate the formal methods explosion, with contracts at Galois (on ARIA's Mathematics for Safe AI program) and the Beneficial AI Foundation. He co-built FVAPPS, a 4,715-problem benchmark for AI-assisted formal verification, and FV-Spec, which translates real-world property-based tests into Lean theorem-proving challenge problems. He writes the Guaranteed Safe AI newsletter and organized the 2024 Proof Scaling workshop at Lighthaven.

Kamil Alaa

Organizer

Operations at Apart Research, managing research sprints and hackathons.

Jason Gross

Speaker

Jason Gross is a co-founder of Theorem, where he is building AI-driven formal verification to address the oversight gap in massively scaled software deployment. He spent his MIT PhD developing Fiat Cryptography, the verified cryptographic code that now underpins HTTPS for trillions of internet connections daily, and remains a core developer of the Rocq (formerly Coq) proof assistant. He holds a Ph.D. from MIT EECS in proof-assistant performance engineering and is based in the San Francisco Bay Area.

Rajashree Agrawal

Speaker

Rajashree Agrawal is the co-founder of Theorem, where she leads the ML research side of trustworthy-by-default AI coding via program equivalence. Her published work spans LLM jailbreaks (co-author on Many-shot Jailbreaking and on transferable image jailbreaks across vision-language models), mechanistic interpretability through compact proofs with Jason Gross, and synthetic-data scaling and model collapse. She also runs Monsoon Math Camp, a math education program in India, and is based in the San Francisco Bay Area.

Joe Kiniry

Speaker

Dr. Joe Kiniry is the CEO and Chief Scientist of Sigil Logic, a Galois spinout, where he leads work on AI-driven formal methods for high-assurance systems. He spent the prior 12 years as Principal Scientist at Galois on hardware, firmware, and software correctness and security, and remains affiliated there. Before Galois he was a Full Professor at the Technical University of Denmark. He holds a Ph.D. from Caltech and is a Senior Member of both IEEE and ACM.

Kiran Gopinathan

Speaker and Judge

Kiran Gopinathan is a Research Scientist at Basis. Her research focuses on techniques for developing newer and better tools for automating formal verification - the art of using computers to automagically construct mathematical proofs about the correctness of software. Her research interests cover formal verification, program synthesis, type systems, language design and proof engineering. She previously completed her postdoc with Talia Ringer at UIUC, and before that earned her PhD in Programming Languages Research from the National University of Singapore on automating the maintenance of formally verified software.

Adam Chlipala

Judge

Adam is faculty in MIT computer science, where he works on implementation and formal verification of full digital stacks and the theorem-proving and language/compiler tools to enable it. He is also cofounder of Nectry, a startup that applies LLM code generation to a tasteful domain-specific language where static analysis is very effective.

Sam Staton

Judge

Sam Staton is Professor of Computer Science at the University of Oxford, working on the semantics of programming languages and probabilistic programming. He currently leads ARIA-funded projects including SynthStats (probabilistic program synthesis for safer AI), and the ERC project Better Languages for Statistics.

Mark Santolucito

Judge

Mark Santolucito is an Assistant Professor of CS at Barnard College, Columbia University, building neurosymbolic program synthesis tools that help people specify, write, and reason about code. His work is funded by public and private organizations such as the NSF and AWS, and was an early employee at Ndea, François Chollet's frontier AGI program synthesis lab.

Santiago Cuellar

Judge

Santiago Cuellar is a Principal Scientist at Galois, where he applies formal methods, separation logic, and machine-checked proofs to software safety and security. His recent work focuses on modular verification of realistic compilers for concurrent programs. He holds a PhD in Computer Science from Princeton, where his dissertation explored modular proofs of optimizing compilers with shared memory concurrency, and a BS in Mathematics from MIT.

Max von Hippel

Judge

Max von Hippel is the co-founder and CTO of Benchify (YC S24), where he leads AI-driven formal verification of real-world code at scale. He completed his Ph.D. at Northeastern's Khoury College on protocol verification and automated attack synthesis under Cristina Nita-Rotaru, work that won Best Paper at NETYS 2023, Best Student Paper at ACL2 2023, and the 2024 USENIX Applied Networking Research Prize. He holds a B.S. in Pure Mathematics from the University of Arizona and co-organizes the Boston Computation Club and the FM × AI hub.

Peter McIntyre

Judge

Peter is the CEO and Co-founder of Trajectory Labs, which builds AI safety evals and RL environments for frontier labs. Previously, he was a Research Scholar at GovAI, early employee at 80,000 Hours (YC S15), and founded Non-Trivial, one of the most selective research programs for high schoolers.

Kaushik "KJ" Jangiti

Judge

Kaushik "KJ" Jangiti is a Staff Security Engineer at Snowflake, focused on AI and agentic AI security, enterprise security architecture, DevSecOps, governance, and incident response. He is a published AI security researcher, industry and academic speaker, and judge, with work spanning agentic AI security, AI-BOM, secure AI adoption, and audit-ready enterprise security governance.

Ashwin Pai

Judge

Ashwin Pai is an engineering leader with over a decade of experience building distributed systems and security products, most recently shipping AI governance and continuous control validation platforms to Fortune 500 customers at RelyanceAI. He was previously CTO of Interfold, a VC-backed fintech startup he built from zero to one.

Twm Stone

Judge

With a background in software engineering, Twm has experience in the threat modelling, formal verification and red teaming of AI systems. He is currently a MATS 9.1 extension fellow as part of the security stream.

Saurabh Yergattikar

Judge

Saurabh Yergattikar is a Lead Engineer / Member of Technical Staff-2 at eBay Inc. and a contributor to the open-source SAFE-MCP project (Linux Foundation / OpenSSF), as well as the architect and developer of the open-source ShieldMCP system.

Dippu Kumar Singh

Judge

Data Leader for Emerging Technologies, transforming enterprises by building scalable, trustworthy, and commercially viable AI systems.

Jess Bergs

Judge

Jess is a member of technical staff at UK AISI where she leads engineering on human-in-the-loop research tools. Her career centres on public-sector innovation with prior work at BBC R&D and on EU Horizon R&D projects.

Herbie Bradley

Judge

Herbie Bradley is a final-year PhD student in Engineering at the University of Cambridge, researching diverse and high-quality synthetic data generation for LLMs. He previously helped build the UK AI Security Institute as a researcher on the Frontier AI Taskforce, organized the Bletchley Summit, and authored The Great Refactor for the Institute for Progress. He now writes AI Pathways and advises governments and early-stage startups on AI policy.

Simon Henniger

Judge

Simon Henniger is a PhD student at Harvard University, advised by Prof. Nada Amin, working on verified program synthesis with LLMs. His prior work at TUM extended Monte-Carlo Tree Search techniques for LLM-driven verification (VerMCTS), and he co-authored "The Dolorem Pattern" (ECOOP 2023) on growing languages through compile-time function execution.

Kanghee Park

Judge

Kanghee Park is a 5th-year CS PhD candidate at UC San Diego, advised by Loris D'Antoni, and a Research Fellow at Theorem. His research scales formal reasoning through modularization and abstraction across program analysis, synthesis, and proof search, with recent papers at NeurIPS, ICML, and OOPSLA on grammar-aligned LLM decoding and specification synthesis.

Nikhil R. Pallepati

Judge

Nikhil R. Pallepati is a Machine Learning Engineer at Microsoft, where he leads the design of production-scale AI systems including graph neural network-based detection systems and agentic LLM pipelines protecting Azure cloud infrastructure, with prior experience building enterprise ML systems at various Fortune 500 companies.

Harshikaa Agrawal

Judge

Harshikaa Agrawal is a Computer Science senior at IIIT Hyderabad. Previously, she researched at Theorem, where she built formally verified parsers for GPU acceleration and AI-driven theorem-proving translations between Rocq and Lean. In high school, she reached the national level at the Linguistics and Informatics Olympiads.

Luiza Corpaci

Judge

Luiza Corpaci is an AI Engineer at AMD working on AI for verification, and a Data Science master's researcher at TU Wien. Her work spans formal methods, combinatorial testing, and AI interpretability

Speakers & Collaborators

Quinn Dougherty

Speaker and Organizer

Quinn Dougherty runs Forall R&D, a research consultancy helping AI safety orgs navigate the formal methods explosion, with contracts at Galois (on ARIA's Mathematics for Safe AI program) and the Beneficial AI Foundation. He co-built FVAPPS, a 4,715-problem benchmark for AI-assisted formal verification, and FV-Spec, which translates real-world property-based tests into Lean theorem-proving challenge problems. He writes the Guaranteed Safe AI newsletter and organized the 2024 Proof Scaling workshop at Lighthaven.

Kamil Alaa

Organizer

Operations at Apart Research, managing research sprints and hackathons.

Jason Gross

Speaker

Jason Gross is a co-founder of Theorem, where he is building AI-driven formal verification to address the oversight gap in massively scaled software deployment. He spent his MIT PhD developing Fiat Cryptography, the verified cryptographic code that now underpins HTTPS for trillions of internet connections daily, and remains a core developer of the Rocq (formerly Coq) proof assistant. He holds a Ph.D. from MIT EECS in proof-assistant performance engineering and is based in the San Francisco Bay Area.

Rajashree Agrawal

Speaker

Rajashree Agrawal is the co-founder of Theorem, where she leads the ML research side of trustworthy-by-default AI coding via program equivalence. Her published work spans LLM jailbreaks (co-author on Many-shot Jailbreaking and on transferable image jailbreaks across vision-language models), mechanistic interpretability through compact proofs with Jason Gross, and synthetic-data scaling and model collapse. She also runs Monsoon Math Camp, a math education program in India, and is based in the San Francisco Bay Area.

Joe Kiniry

Speaker

Dr. Joe Kiniry is the CEO and Chief Scientist of Sigil Logic, a Galois spinout, where he leads work on AI-driven formal methods for high-assurance systems. He spent the prior 12 years as Principal Scientist at Galois on hardware, firmware, and software correctness and security, and remains affiliated there. Before Galois he was a Full Professor at the Technical University of Denmark. He holds a Ph.D. from Caltech and is a Senior Member of both IEEE and ACM.

Kiran Gopinathan

Speaker and Judge

Kiran Gopinathan is a Research Scientist at Basis. Her research focuses on techniques for developing newer and better tools for automating formal verification - the art of using computers to automagically construct mathematical proofs about the correctness of software. Her research interests cover formal verification, program synthesis, type systems, language design and proof engineering. She previously completed her postdoc with Talia Ringer at UIUC, and before that earned her PhD in Programming Languages Research from the National University of Singapore on automating the maintenance of formally verified software.

Adam Chlipala

Judge

Adam is faculty in MIT computer science, where he works on implementation and formal verification of full digital stacks and the theorem-proving and language/compiler tools to enable it. He is also cofounder of Nectry, a startup that applies LLM code generation to a tasteful domain-specific language where static analysis is very effective.

Sam Staton

Judge

Sam Staton is Professor of Computer Science at the University of Oxford, working on the semantics of programming languages and probabilistic programming. He currently leads ARIA-funded projects including SynthStats (probabilistic program synthesis for safer AI), and the ERC project Better Languages for Statistics.

Mark Santolucito

Judge

Mark Santolucito is an Assistant Professor of CS at Barnard College, Columbia University, building neurosymbolic program synthesis tools that help people specify, write, and reason about code. His work is funded by public and private organizations such as the NSF and AWS, and was an early employee at Ndea, François Chollet's frontier AGI program synthesis lab.

Santiago Cuellar

Judge

Santiago Cuellar is a Principal Scientist at Galois, where he applies formal methods, separation logic, and machine-checked proofs to software safety and security. His recent work focuses on modular verification of realistic compilers for concurrent programs. He holds a PhD in Computer Science from Princeton, where his dissertation explored modular proofs of optimizing compilers with shared memory concurrency, and a BS in Mathematics from MIT.

Max von Hippel

Judge

Max von Hippel is the co-founder and CTO of Benchify (YC S24), where he leads AI-driven formal verification of real-world code at scale. He completed his Ph.D. at Northeastern's Khoury College on protocol verification and automated attack synthesis under Cristina Nita-Rotaru, work that won Best Paper at NETYS 2023, Best Student Paper at ACL2 2023, and the 2024 USENIX Applied Networking Research Prize. He holds a B.S. in Pure Mathematics from the University of Arizona and co-organizes the Boston Computation Club and the FM × AI hub.

Peter McIntyre

Judge

Peter is the CEO and Co-founder of Trajectory Labs, which builds AI safety evals and RL environments for frontier labs. Previously, he was a Research Scholar at GovAI, early employee at 80,000 Hours (YC S15), and founded Non-Trivial, one of the most selective research programs for high schoolers.

Kaushik "KJ" Jangiti

Judge

Kaushik "KJ" Jangiti is a Staff Security Engineer at Snowflake, focused on AI and agentic AI security, enterprise security architecture, DevSecOps, governance, and incident response. He is a published AI security researcher, industry and academic speaker, and judge, with work spanning agentic AI security, AI-BOM, secure AI adoption, and audit-ready enterprise security governance.

Ashwin Pai

Judge

Ashwin Pai is an engineering leader with over a decade of experience building distributed systems and security products, most recently shipping AI governance and continuous control validation platforms to Fortune 500 customers at RelyanceAI. He was previously CTO of Interfold, a VC-backed fintech startup he built from zero to one.

Twm Stone

Judge

With a background in software engineering, Twm has experience in the threat modelling, formal verification and red teaming of AI systems. He is currently a MATS 9.1 extension fellow as part of the security stream.

Saurabh Yergattikar

Judge

Saurabh Yergattikar is a Lead Engineer / Member of Technical Staff-2 at eBay Inc. and a contributor to the open-source SAFE-MCP project (Linux Foundation / OpenSSF), as well as the architect and developer of the open-source ShieldMCP system.

Dippu Kumar Singh

Judge

Data Leader for Emerging Technologies, transforming enterprises by building scalable, trustworthy, and commercially viable AI systems.

Jess Bergs

Judge

Jess is a member of technical staff at UK AISI where she leads engineering on human-in-the-loop research tools. Her career centres on public-sector innovation with prior work at BBC R&D and on EU Horizon R&D projects.

Herbie Bradley

Judge

Herbie Bradley is a final-year PhD student in Engineering at the University of Cambridge, researching diverse and high-quality synthetic data generation for LLMs. He previously helped build the UK AI Security Institute as a researcher on the Frontier AI Taskforce, organized the Bletchley Summit, and authored The Great Refactor for the Institute for Progress. He now writes AI Pathways and advises governments and early-stage startups on AI policy.

Simon Henniger

Judge

Simon Henniger is a PhD student at Harvard University, advised by Prof. Nada Amin, working on verified program synthesis with LLMs. His prior work at TUM extended Monte-Carlo Tree Search techniques for LLM-driven verification (VerMCTS), and he co-authored "The Dolorem Pattern" (ECOOP 2023) on growing languages through compile-time function execution.

Kanghee Park

Judge

Kanghee Park is a 5th-year CS PhD candidate at UC San Diego, advised by Loris D'Antoni, and a Research Fellow at Theorem. His research scales formal reasoning through modularization and abstraction across program analysis, synthesis, and proof search, with recent papers at NeurIPS, ICML, and OOPSLA on grammar-aligned LLM decoding and specification synthesis.

Nikhil R. Pallepati

Judge

Nikhil R. Pallepati is a Machine Learning Engineer at Microsoft, where he leads the design of production-scale AI systems including graph neural network-based detection systems and agentic LLM pipelines protecting Azure cloud infrastructure, with prior experience building enterprise ML systems at various Fortune 500 companies.

Harshikaa Agrawal

Judge

Harshikaa Agrawal is a Computer Science senior at IIIT Hyderabad. Previously, she researched at Theorem, where she built formally verified parsers for GPU acceleration and AI-driven theorem-proving translations between Rocq and Lean. In high school, she reached the national level at the Linguistics and Informatics Olympiads.

Luiza Corpaci

Judge

Luiza Corpaci is an AI Engineer at AMD working on AI for verification, and a Data Science master's researcher at TU Wien. Her work spans formal methods, combinatorial testing, and AI interpretability

Registered Local Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.