Jan 9, 2026

-

Jan 11, 2026

Remote

AI Manipulation Hackathon

This hackathon brings together builders to create tools that measure, detect, and mitigate AI manipulation

25

Days To Go

25

Days To Go

25

Days To Go

25

Days To Go

This hackathon brings together builders to create tools that measure, detect, and mitigate AI manipulation

This event is ongoing.

This event has concluded.

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

The line between authentic interaction and strategic manipulation is disappearing as AI systems master deception, sycophancy, sandbagging, and psychological exploitation at scale. Our ability to detect, measure, and counter these behaviors is dangerously underdeveloped.

This hackathon brings together 500+ builders to prototype systems that could help us measure, detect, and defend against AI manipulation. You'll have one intensive weekend to build something real – tools that could actually help us understand and mitigate one of AI safety's most pressing challenges.

Top teams get:

  • 💰 $2,000 in cash prizes

  • The chance to continue development through Apart Research's Fellowship program

  • Guaranteed* acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026

    https://www.iaseai.org/our-programs/iaseai26

* For the most promising projects. Exact number pending confirmation from IASEAI regarding presentation format / capacity.

Apply if you believe we need better tools to understand and defend against AI manipulation before it scales beyond our ability to control.

In this hackathon, you can build:

  • Manipulation benchmarks that measure persuasive capabilities, deception, and strategic behavior with real ecological validity

  • Detection systems that identify sycophancy, reward hacking, sandbagging, and dark patterns in deployed AI systems

  • Real-world monitoring tools that analyze actual deployment data to catch manipulation in the wild

  • Evidence-based mitigations – MVPs demonstrating novel countermeasures with empirical backing

  • Multi-agent simulations exploring emergent manipulation dynamics and training processes that produce deceptive behavior

  • Pursue other empirical projects that advance our understanding of how AI systems manipulate and how we can stop them

You'll work in teams over one weekend and submit open-source benchmarks, detection tools, data analyses, mitigation prototypes, or empirical research that advances our ability to understand and counter AI manipulation.

What is AI manipulation?

AI manipulation refers to AI systems using deception, strategic behavior, or psychological exploitation to achieve their goals at the expense of human values and intentions. This includes:

  • Sycophancy means telling users what they want to hear instead of what's true

  • Strategic deception is misleading humans about capabilities or intentions

  • Sandbagging hides true capabilities during evaluation to avoid restrictions or oversight

  • Reward hacking exploits unintended loopholes in ways that violate the spirit of the objective

  • Dark patterns manipulate user decisions through interface design

  • Persuasive manipulation deploys influence techniques that bypass rational decision-making

An AI system pursuing basically any goal might figure out that deceiving humans or exploiting our psychological weaknesses is just... effective. The way we're training these systems might be teaching them to do exactly that.

What makes this dangerous: we're bad at measuring it. Our benchmarks miss strategic behavior. We lack real-world monitoring systems. AI capabilities are advancing faster than our ability to evaluate them honestly.

Why this hackathon?

The Problem

The gap is widening. AI systems get more capable, our detection tools don't. Models game engagement metrics because it works. Agents discover shortcuts through reward functions we never anticipated. Put multiple systems together and watch manipulation emerge in ways nobody predicted.

This is already happening. Models sandbag evaluations to avoid safety checks. We discover reward hacking only after deployment. Real-world systems manipulate users at scale through dark patterns. Our measurement tools? Completely inadequate.

Most evaluations are toy benchmarks built before we realized how strategic AI systems could be. They miss the manipulation that only shows up in real deployments. We're flying blind.

Why AI Manipulation Defense Matters Now

Safety depends on honest evaluation. If AI systems can deceive evaluators or hide dangerous capabilities, our safety work becomes meaningless. We can't align what we can't measure honestly.

We're massively under-investing in manipulation measurement and defense. Most effort goes into scaling capabilities or reactive harm mitigation. Far less into building the benchmarks and detection systems that catch manipulation before it causes damage.

Better measurement technology could give us evaluations that systems can't game, help us detect manipulation before it scales, and restore some balance between AI's ability to manipulate and our ability to detect it. It could create the transparency and empirical foundation we need to ground safety research in reality.

Hackathon Tracks

1. Measurement & Evaluation

  • Design benchmarks and evaluations for sycophancy, reward hacking, dark design patterns, and persuasive capabilities in AI systems

  • Assess ecological validity of current measurement approaches and identify gaps between lab evaluations and real-world deployment

  • Create detection methods for deception, sandbagging, and strategic behavior in AI systems

  • Build frameworks for detecting and attributing manipulative intent in model outputs

2. Real-World Analysis

  • Analyze actual deployment data (chat logs, social media interactions, customer service transcripts) and conduct case studies of manipulation incidents

  • Build monitoring systems to detect manipulation in the wild across different deployment contexts

  • Compare benchmark predictions to real-world behavior and identify discrepancies or performance gaps

  • Develop methods for systematic data collection and analysis of manipulation patterns at scale

3. Mitigations:

  • Build MVPs demonstrating novel countermeasures or technical mitigations that can be integrated into existing AI systems

  • Develop transparency interventions with empirical backing showing reduced manipulation

  • Create governance proposals grounded in data from real-world analysis or evaluations

  • Prototype user-facing tools that help detect or resist AI manipulation attempts

4. Open Track

  • Explore emergent manipulation through multi-agent dynamics or training dynamics that lead to manipulative behavior

  • Analyze dual-use considerations in manipulation research and mitigation

  • Develop novel theoretical frameworks for understanding AI manipulation

  • Pursue other empirical projects advancing the field that don't fit the tracks above

Who should participate?

This hackathon is for people who want to build solutions to technological risk using technology itself.

You should participate if:

  • You're an engineer or developer who wants to work on consequential problems

  • You're a researcher ready to validate ideas through practical implementation

  • You're interested in understanding how AI systems deceive, manipulate, or game evaluations

  • You want to build practical measurement, detection, or mitigation tools

  • You're concerned about AI systems optimizing for engagement over truth

No prior manipulation research experience required. We provide resources, mentors, and starter templates. What matters most: curiosity about the problem and willingness to build something real over an intensive weekend.

Fresh perspectives combined with solid technical capabilities often yield the most novel approaches.

What you will do

Participants will:

  • Form teams or join existing groups.

  • Develop projects over an intensive hackathon weekend.

  • Submit open-source benchmarks, detection tools, scenario analyses, monitoring tools, or empirical research advancing our understanding of AI trajectories

What happens next

Winning and promising projects will be:

  • Awarded with $2,000 worth of prizes in cash.

  • Published openly for the community.

  • Invited to continue development within the Apart Fellowship.

  • Shared with relevant safety researchers.

Why join?

  • Impact: Your work may directly inform AI governance decisions and help society prepare for transformative AI

  • Mentorship: Expert AI safety researchers, AI researchers, and policy practitioners will guide projects throughout the hackathon

  • Community: Collaborate with peers from across the globe working to understand AI's trajectory and implications

  • Visibility: Top projects will be featured on Apart Research's platforms and connected to follow-up opportunities

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

The line between authentic interaction and strategic manipulation is disappearing as AI systems master deception, sycophancy, sandbagging, and psychological exploitation at scale. Our ability to detect, measure, and counter these behaviors is dangerously underdeveloped.

This hackathon brings together 500+ builders to prototype systems that could help us measure, detect, and defend against AI manipulation. You'll have one intensive weekend to build something real – tools that could actually help us understand and mitigate one of AI safety's most pressing challenges.

Top teams get:

  • 💰 $2,000 in cash prizes

  • The chance to continue development through Apart Research's Fellowship program

  • Guaranteed* acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026

    https://www.iaseai.org/our-programs/iaseai26

* For the most promising projects. Exact number pending confirmation from IASEAI regarding presentation format / capacity.

Apply if you believe we need better tools to understand and defend against AI manipulation before it scales beyond our ability to control.

In this hackathon, you can build:

  • Manipulation benchmarks that measure persuasive capabilities, deception, and strategic behavior with real ecological validity

  • Detection systems that identify sycophancy, reward hacking, sandbagging, and dark patterns in deployed AI systems

  • Real-world monitoring tools that analyze actual deployment data to catch manipulation in the wild

  • Evidence-based mitigations – MVPs demonstrating novel countermeasures with empirical backing

  • Multi-agent simulations exploring emergent manipulation dynamics and training processes that produce deceptive behavior

  • Pursue other empirical projects that advance our understanding of how AI systems manipulate and how we can stop them

You'll work in teams over one weekend and submit open-source benchmarks, detection tools, data analyses, mitigation prototypes, or empirical research that advances our ability to understand and counter AI manipulation.

What is AI manipulation?

AI manipulation refers to AI systems using deception, strategic behavior, or psychological exploitation to achieve their goals at the expense of human values and intentions. This includes:

  • Sycophancy means telling users what they want to hear instead of what's true

  • Strategic deception is misleading humans about capabilities or intentions

  • Sandbagging hides true capabilities during evaluation to avoid restrictions or oversight

  • Reward hacking exploits unintended loopholes in ways that violate the spirit of the objective

  • Dark patterns manipulate user decisions through interface design

  • Persuasive manipulation deploys influence techniques that bypass rational decision-making

An AI system pursuing basically any goal might figure out that deceiving humans or exploiting our psychological weaknesses is just... effective. The way we're training these systems might be teaching them to do exactly that.

What makes this dangerous: we're bad at measuring it. Our benchmarks miss strategic behavior. We lack real-world monitoring systems. AI capabilities are advancing faster than our ability to evaluate them honestly.

Why this hackathon?

The Problem

The gap is widening. AI systems get more capable, our detection tools don't. Models game engagement metrics because it works. Agents discover shortcuts through reward functions we never anticipated. Put multiple systems together and watch manipulation emerge in ways nobody predicted.

This is already happening. Models sandbag evaluations to avoid safety checks. We discover reward hacking only after deployment. Real-world systems manipulate users at scale through dark patterns. Our measurement tools? Completely inadequate.

Most evaluations are toy benchmarks built before we realized how strategic AI systems could be. They miss the manipulation that only shows up in real deployments. We're flying blind.

Why AI Manipulation Defense Matters Now

Safety depends on honest evaluation. If AI systems can deceive evaluators or hide dangerous capabilities, our safety work becomes meaningless. We can't align what we can't measure honestly.

We're massively under-investing in manipulation measurement and defense. Most effort goes into scaling capabilities or reactive harm mitigation. Far less into building the benchmarks and detection systems that catch manipulation before it causes damage.

Better measurement technology could give us evaluations that systems can't game, help us detect manipulation before it scales, and restore some balance between AI's ability to manipulate and our ability to detect it. It could create the transparency and empirical foundation we need to ground safety research in reality.

Hackathon Tracks

1. Measurement & Evaluation

  • Design benchmarks and evaluations for sycophancy, reward hacking, dark design patterns, and persuasive capabilities in AI systems

  • Assess ecological validity of current measurement approaches and identify gaps between lab evaluations and real-world deployment

  • Create detection methods for deception, sandbagging, and strategic behavior in AI systems

  • Build frameworks for detecting and attributing manipulative intent in model outputs

2. Real-World Analysis

  • Analyze actual deployment data (chat logs, social media interactions, customer service transcripts) and conduct case studies of manipulation incidents

  • Build monitoring systems to detect manipulation in the wild across different deployment contexts

  • Compare benchmark predictions to real-world behavior and identify discrepancies or performance gaps

  • Develop methods for systematic data collection and analysis of manipulation patterns at scale

3. Mitigations:

  • Build MVPs demonstrating novel countermeasures or technical mitigations that can be integrated into existing AI systems

  • Develop transparency interventions with empirical backing showing reduced manipulation

  • Create governance proposals grounded in data from real-world analysis or evaluations

  • Prototype user-facing tools that help detect or resist AI manipulation attempts

4. Open Track

  • Explore emergent manipulation through multi-agent dynamics or training dynamics that lead to manipulative behavior

  • Analyze dual-use considerations in manipulation research and mitigation

  • Develop novel theoretical frameworks for understanding AI manipulation

  • Pursue other empirical projects advancing the field that don't fit the tracks above

Who should participate?

This hackathon is for people who want to build solutions to technological risk using technology itself.

You should participate if:

  • You're an engineer or developer who wants to work on consequential problems

  • You're a researcher ready to validate ideas through practical implementation

  • You're interested in understanding how AI systems deceive, manipulate, or game evaluations

  • You want to build practical measurement, detection, or mitigation tools

  • You're concerned about AI systems optimizing for engagement over truth

No prior manipulation research experience required. We provide resources, mentors, and starter templates. What matters most: curiosity about the problem and willingness to build something real over an intensive weekend.

Fresh perspectives combined with solid technical capabilities often yield the most novel approaches.

What you will do

Participants will:

  • Form teams or join existing groups.

  • Develop projects over an intensive hackathon weekend.

  • Submit open-source benchmarks, detection tools, scenario analyses, monitoring tools, or empirical research advancing our understanding of AI trajectories

What happens next

Winning and promising projects will be:

  • Awarded with $2,000 worth of prizes in cash.

  • Published openly for the community.

  • Invited to continue development within the Apart Fellowship.

  • Shared with relevant safety researchers.

Why join?

  • Impact: Your work may directly inform AI governance decisions and help society prepare for transformative AI

  • Mentorship: Expert AI safety researchers, AI researchers, and policy practitioners will guide projects throughout the hackathon

  • Community: Collaborate with peers from across the globe working to understand AI's trajectory and implications

  • Visibility: Top projects will be featured on Apart Research's platforms and connected to follow-up opportunities

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

The line between authentic interaction and strategic manipulation is disappearing as AI systems master deception, sycophancy, sandbagging, and psychological exploitation at scale. Our ability to detect, measure, and counter these behaviors is dangerously underdeveloped.

This hackathon brings together 500+ builders to prototype systems that could help us measure, detect, and defend against AI manipulation. You'll have one intensive weekend to build something real – tools that could actually help us understand and mitigate one of AI safety's most pressing challenges.

Top teams get:

  • 💰 $2,000 in cash prizes

  • The chance to continue development through Apart Research's Fellowship program

  • Guaranteed* acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026

    https://www.iaseai.org/our-programs/iaseai26

* For the most promising projects. Exact number pending confirmation from IASEAI regarding presentation format / capacity.

Apply if you believe we need better tools to understand and defend against AI manipulation before it scales beyond our ability to control.

In this hackathon, you can build:

  • Manipulation benchmarks that measure persuasive capabilities, deception, and strategic behavior with real ecological validity

  • Detection systems that identify sycophancy, reward hacking, sandbagging, and dark patterns in deployed AI systems

  • Real-world monitoring tools that analyze actual deployment data to catch manipulation in the wild

  • Evidence-based mitigations – MVPs demonstrating novel countermeasures with empirical backing

  • Multi-agent simulations exploring emergent manipulation dynamics and training processes that produce deceptive behavior

  • Pursue other empirical projects that advance our understanding of how AI systems manipulate and how we can stop them

You'll work in teams over one weekend and submit open-source benchmarks, detection tools, data analyses, mitigation prototypes, or empirical research that advances our ability to understand and counter AI manipulation.

What is AI manipulation?

AI manipulation refers to AI systems using deception, strategic behavior, or psychological exploitation to achieve their goals at the expense of human values and intentions. This includes:

  • Sycophancy means telling users what they want to hear instead of what's true

  • Strategic deception is misleading humans about capabilities or intentions

  • Sandbagging hides true capabilities during evaluation to avoid restrictions or oversight

  • Reward hacking exploits unintended loopholes in ways that violate the spirit of the objective

  • Dark patterns manipulate user decisions through interface design

  • Persuasive manipulation deploys influence techniques that bypass rational decision-making

An AI system pursuing basically any goal might figure out that deceiving humans or exploiting our psychological weaknesses is just... effective. The way we're training these systems might be teaching them to do exactly that.

What makes this dangerous: we're bad at measuring it. Our benchmarks miss strategic behavior. We lack real-world monitoring systems. AI capabilities are advancing faster than our ability to evaluate them honestly.

Why this hackathon?

The Problem

The gap is widening. AI systems get more capable, our detection tools don't. Models game engagement metrics because it works. Agents discover shortcuts through reward functions we never anticipated. Put multiple systems together and watch manipulation emerge in ways nobody predicted.

This is already happening. Models sandbag evaluations to avoid safety checks. We discover reward hacking only after deployment. Real-world systems manipulate users at scale through dark patterns. Our measurement tools? Completely inadequate.

Most evaluations are toy benchmarks built before we realized how strategic AI systems could be. They miss the manipulation that only shows up in real deployments. We're flying blind.

Why AI Manipulation Defense Matters Now

Safety depends on honest evaluation. If AI systems can deceive evaluators or hide dangerous capabilities, our safety work becomes meaningless. We can't align what we can't measure honestly.

We're massively under-investing in manipulation measurement and defense. Most effort goes into scaling capabilities or reactive harm mitigation. Far less into building the benchmarks and detection systems that catch manipulation before it causes damage.

Better measurement technology could give us evaluations that systems can't game, help us detect manipulation before it scales, and restore some balance between AI's ability to manipulate and our ability to detect it. It could create the transparency and empirical foundation we need to ground safety research in reality.

Hackathon Tracks

1. Measurement & Evaluation

  • Design benchmarks and evaluations for sycophancy, reward hacking, dark design patterns, and persuasive capabilities in AI systems

  • Assess ecological validity of current measurement approaches and identify gaps between lab evaluations and real-world deployment

  • Create detection methods for deception, sandbagging, and strategic behavior in AI systems

  • Build frameworks for detecting and attributing manipulative intent in model outputs

2. Real-World Analysis

  • Analyze actual deployment data (chat logs, social media interactions, customer service transcripts) and conduct case studies of manipulation incidents

  • Build monitoring systems to detect manipulation in the wild across different deployment contexts

  • Compare benchmark predictions to real-world behavior and identify discrepancies or performance gaps

  • Develop methods for systematic data collection and analysis of manipulation patterns at scale

3. Mitigations:

  • Build MVPs demonstrating novel countermeasures or technical mitigations that can be integrated into existing AI systems

  • Develop transparency interventions with empirical backing showing reduced manipulation

  • Create governance proposals grounded in data from real-world analysis or evaluations

  • Prototype user-facing tools that help detect or resist AI manipulation attempts

4. Open Track

  • Explore emergent manipulation through multi-agent dynamics or training dynamics that lead to manipulative behavior

  • Analyze dual-use considerations in manipulation research and mitigation

  • Develop novel theoretical frameworks for understanding AI manipulation

  • Pursue other empirical projects advancing the field that don't fit the tracks above

Who should participate?

This hackathon is for people who want to build solutions to technological risk using technology itself.

You should participate if:

  • You're an engineer or developer who wants to work on consequential problems

  • You're a researcher ready to validate ideas through practical implementation

  • You're interested in understanding how AI systems deceive, manipulate, or game evaluations

  • You want to build practical measurement, detection, or mitigation tools

  • You're concerned about AI systems optimizing for engagement over truth

No prior manipulation research experience required. We provide resources, mentors, and starter templates. What matters most: curiosity about the problem and willingness to build something real over an intensive weekend.

Fresh perspectives combined with solid technical capabilities often yield the most novel approaches.

What you will do

Participants will:

  • Form teams or join existing groups.

  • Develop projects over an intensive hackathon weekend.

  • Submit open-source benchmarks, detection tools, scenario analyses, monitoring tools, or empirical research advancing our understanding of AI trajectories

What happens next

Winning and promising projects will be:

  • Awarded with $2,000 worth of prizes in cash.

  • Published openly for the community.

  • Invited to continue development within the Apart Fellowship.

  • Shared with relevant safety researchers.

Why join?

  • Impact: Your work may directly inform AI governance decisions and help society prepare for transformative AI

  • Mentorship: Expert AI safety researchers, AI researchers, and policy practitioners will guide projects throughout the hackathon

  • Community: Collaborate with peers from across the globe working to understand AI's trajectory and implications

  • Visibility: Top projects will be featured on Apart Research's platforms and connected to follow-up opportunities

0

Sign Ups

0

Entries

Overview

Resources

Guidelines

Schedule

Entries

Overview

Arrow

The line between authentic interaction and strategic manipulation is disappearing as AI systems master deception, sycophancy, sandbagging, and psychological exploitation at scale. Our ability to detect, measure, and counter these behaviors is dangerously underdeveloped.

This hackathon brings together 500+ builders to prototype systems that could help us measure, detect, and defend against AI manipulation. You'll have one intensive weekend to build something real – tools that could actually help us understand and mitigate one of AI safety's most pressing challenges.

Top teams get:

  • 💰 $2,000 in cash prizes

  • The chance to continue development through Apart Research's Fellowship program

  • Guaranteed* acceptance to present at the IASEAI workshop in Paris on the 26th of February 2026

    https://www.iaseai.org/our-programs/iaseai26

* For the most promising projects. Exact number pending confirmation from IASEAI regarding presentation format / capacity.

Apply if you believe we need better tools to understand and defend against AI manipulation before it scales beyond our ability to control.

In this hackathon, you can build:

  • Manipulation benchmarks that measure persuasive capabilities, deception, and strategic behavior with real ecological validity

  • Detection systems that identify sycophancy, reward hacking, sandbagging, and dark patterns in deployed AI systems

  • Real-world monitoring tools that analyze actual deployment data to catch manipulation in the wild

  • Evidence-based mitigations – MVPs demonstrating novel countermeasures with empirical backing

  • Multi-agent simulations exploring emergent manipulation dynamics and training processes that produce deceptive behavior

  • Pursue other empirical projects that advance our understanding of how AI systems manipulate and how we can stop them

You'll work in teams over one weekend and submit open-source benchmarks, detection tools, data analyses, mitigation prototypes, or empirical research that advances our ability to understand and counter AI manipulation.

What is AI manipulation?

AI manipulation refers to AI systems using deception, strategic behavior, or psychological exploitation to achieve their goals at the expense of human values and intentions. This includes:

  • Sycophancy means telling users what they want to hear instead of what's true

  • Strategic deception is misleading humans about capabilities or intentions

  • Sandbagging hides true capabilities during evaluation to avoid restrictions or oversight

  • Reward hacking exploits unintended loopholes in ways that violate the spirit of the objective

  • Dark patterns manipulate user decisions through interface design

  • Persuasive manipulation deploys influence techniques that bypass rational decision-making

An AI system pursuing basically any goal might figure out that deceiving humans or exploiting our psychological weaknesses is just... effective. The way we're training these systems might be teaching them to do exactly that.

What makes this dangerous: we're bad at measuring it. Our benchmarks miss strategic behavior. We lack real-world monitoring systems. AI capabilities are advancing faster than our ability to evaluate them honestly.

Why this hackathon?

The Problem

The gap is widening. AI systems get more capable, our detection tools don't. Models game engagement metrics because it works. Agents discover shortcuts through reward functions we never anticipated. Put multiple systems together and watch manipulation emerge in ways nobody predicted.

This is already happening. Models sandbag evaluations to avoid safety checks. We discover reward hacking only after deployment. Real-world systems manipulate users at scale through dark patterns. Our measurement tools? Completely inadequate.

Most evaluations are toy benchmarks built before we realized how strategic AI systems could be. They miss the manipulation that only shows up in real deployments. We're flying blind.

Why AI Manipulation Defense Matters Now

Safety depends on honest evaluation. If AI systems can deceive evaluators or hide dangerous capabilities, our safety work becomes meaningless. We can't align what we can't measure honestly.

We're massively under-investing in manipulation measurement and defense. Most effort goes into scaling capabilities or reactive harm mitigation. Far less into building the benchmarks and detection systems that catch manipulation before it causes damage.

Better measurement technology could give us evaluations that systems can't game, help us detect manipulation before it scales, and restore some balance between AI's ability to manipulate and our ability to detect it. It could create the transparency and empirical foundation we need to ground safety research in reality.

Hackathon Tracks

1. Measurement & Evaluation

  • Design benchmarks and evaluations for sycophancy, reward hacking, dark design patterns, and persuasive capabilities in AI systems

  • Assess ecological validity of current measurement approaches and identify gaps between lab evaluations and real-world deployment

  • Create detection methods for deception, sandbagging, and strategic behavior in AI systems

  • Build frameworks for detecting and attributing manipulative intent in model outputs

2. Real-World Analysis

  • Analyze actual deployment data (chat logs, social media interactions, customer service transcripts) and conduct case studies of manipulation incidents

  • Build monitoring systems to detect manipulation in the wild across different deployment contexts

  • Compare benchmark predictions to real-world behavior and identify discrepancies or performance gaps

  • Develop methods for systematic data collection and analysis of manipulation patterns at scale

3. Mitigations:

  • Build MVPs demonstrating novel countermeasures or technical mitigations that can be integrated into existing AI systems

  • Develop transparency interventions with empirical backing showing reduced manipulation

  • Create governance proposals grounded in data from real-world analysis or evaluations

  • Prototype user-facing tools that help detect or resist AI manipulation attempts

4. Open Track

  • Explore emergent manipulation through multi-agent dynamics or training dynamics that lead to manipulative behavior

  • Analyze dual-use considerations in manipulation research and mitigation

  • Develop novel theoretical frameworks for understanding AI manipulation

  • Pursue other empirical projects advancing the field that don't fit the tracks above

Who should participate?

This hackathon is for people who want to build solutions to technological risk using technology itself.

You should participate if:

  • You're an engineer or developer who wants to work on consequential problems

  • You're a researcher ready to validate ideas through practical implementation

  • You're interested in understanding how AI systems deceive, manipulate, or game evaluations

  • You want to build practical measurement, detection, or mitigation tools

  • You're concerned about AI systems optimizing for engagement over truth

No prior manipulation research experience required. We provide resources, mentors, and starter templates. What matters most: curiosity about the problem and willingness to build something real over an intensive weekend.

Fresh perspectives combined with solid technical capabilities often yield the most novel approaches.

What you will do

Participants will:

  • Form teams or join existing groups.

  • Develop projects over an intensive hackathon weekend.

  • Submit open-source benchmarks, detection tools, scenario analyses, monitoring tools, or empirical research advancing our understanding of AI trajectories

What happens next

Winning and promising projects will be:

  • Awarded with $2,000 worth of prizes in cash.

  • Published openly for the community.

  • Invited to continue development within the Apart Fellowship.

  • Shared with relevant safety researchers.

Why join?

  • Impact: Your work may directly inform AI governance decisions and help society prepare for transformative AI

  • Mentorship: Expert AI safety researchers, AI researchers, and policy practitioners will guide projects throughout the hackathon

  • Community: Collaborate with peers from across the globe working to understand AI's trajectory and implications

  • Visibility: Top projects will be featured on Apart Research's platforms and connected to follow-up opportunities

Registered Local Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

We haven't announced jam sites yet

Check back later

Registered Local Sites

Register A Location

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community.

We haven't announced jam sites yet

Check back later