Are you fascinated by the incredible advancements in AI? As AI becomes smarter and more powerful, it's essential that we make sure it always tells the truth and doesn't trick people. Imagine if an AI could manipulate narratives or try to scheme—that could lead to serious problems!
That's where you come in. We're inviting you to join the Deception Detection Hackathon, an exciting event where you'll team up with researchers, programmers, and AI safety experts to create amazing pilot experiments to spot when an AI is being deceptive (and potentially reducing deceptive tendencies!). Over one thrilling weekend, you'll put your skills to the test and develop cutting-edge techniques to keep AI honest and trustworthy.
Deception in AI, a concept severely under-explored, occurs when an AI system is capable of deceiving a user, either designed for it by a malicious actor or due to misaligned goals. Such systems may appear to be aligned with users and humans values during training and evaluation but pursue malign objectives when deployed, potentially causing harm or undermining trust in AI.
Examples of such work can be found in:
To mitigate these risks, we must develop robust deception detection methods that can identify instances of strategic deception, make headway on understanding AI capabilities for deception, and prevent AI systems from misleading humans. By participating in this hackathon, you'll contribute to the critical task of ensuring that AI remains transparent, accountable, and aligned with human values.
During the hackathon, you'll have the opportunity to:
Whether you're an AI researcher, developer, or enthusiast, this hackathon provides a unique platform to apply your skills and knowledge to address one of the most pressing challenges in AI safety.
Join us in late June for a weekend of collaboration, innovation, and problem-solving as we work together to prevent AI from deceiving humans. Stay tuned for more details on the exact dates, format, and registration process.
Don't miss this opportunity to contribute to the development of trustworthy AI systems and help shape a future where AI and humans can work together safely and transparently. Let's hack for a deception-free AI future!
You will join in teams to submit a PDF about your research according to the submission template shared in the submission tab! Depending on the judge's reviews, you'll have the chance to win from the $2,000 prize pool! Find the review criteria on the submission tab.
The AGI Deception Detection Hackathon is a weekend-long event where you participate in teams (1-5) to create interesting, fun, and impactful research. You submit a PDF report that summarizes and discusses your findings in the context of AI safety. These reports will be judged by our panel and you can win up to $1,000!
It runs from 28th June to 1st July and we're excited to welcome you for a weekend of engaging research. You will hear fascinating talks about real-world projects tackling these types of questions, get the opportunity to discuss your ideas with experienced mentors, and you will get reviews from top-tier researchers in the field of AI safety to further your exploration.
Everyone can participate and we encourage you to join especially if you’re considering AI safety from another career. We give you code templates and ideas to kickstart your projects and you’ll be surprised what you can accomplish in just a weekend – especially with your new-found community!
Read more about what you can expect, the schedule, and what previous participants have said about being part of the hackathon below.
There’s loads of reasons to join! Here are just a few:
Please join! This can be your first foray into AI and ML safety and maybe you’ll realize that there are exciting low-hanging fruit that your specific skillset is adapted to. Even if you normally don’t find it particularly interesting, this time you might see it in a new light!
There’s a lot of pressure from AI safety to perform at a top level and this seems to drive some people out of the field. We’d love it if you consider joining with a mindset of fun exploration and get a positive experience out of the weekend.
Yoann Poupart, BlockLoads CTO: "This Hackathon was a perfect blend of learning, testing, and collaboration on cutting-edge AI Safety research. I really feel that I gained practical knowledge that cannot be learned only by reading articles.”
Lucie Philippon, France Pacific Territories Economic Committee: "It was great meeting such cool people to work with over the weekend! I did not know any of the other people in my group at first, and now I'm looking forward to working with them again on research projects! The organizers were also super helpful and contributed a lot to the success of our project.”
Akash Kundu, now an Apart Lab fellow: "It was an amazing experience working with people I didn't even know before the hackathon. All three of my teammates were extremely spread out, while I am from India, my teammates were from New York and Taiwan. It was amazing how we pulled this off in 48 hours in spite of the time difference. Moreover, the mentors were extremely encouraging and supportive which helped us gain clarity whenever we got stuck and helped us create an interesting project in the end.”
Nora Petrova, ML Engineer at Prolific: “The hackathon really helped me to be embedded in a community where everyone was working on the same topic. There was a lot of curiosity and interest in the community. Getting feedback from others was interesting as well and I could see how other researchers perceived my project. It was also really interesting to see all the other projects and it was positive to see other's work on it.”
Chris Mathwin, MATS Scholar: "The Interpretability Hackathon exceeded my expectations, it was incredibly well organized with an intelligently curated list of very helpful resources. I had a lot of fun participating and genuinely feel I was able to learn significantly more than I would have, had I spent my time elsewhere. I highly recommend these events to anyone who is interested in this sort of work!”
Besides emphasizing the introduction of concrete mitigation ideas for the risks presented, we are aware that projects emerging from this hackathon might pose a risk if disseminated irresponsibly.
For all of Apart's research events and dissemination, we follow our Responsible Disclosure Policy.
We have collected a few exciting resources about deception, both to get you started with research in this area but also to dive even deeper. Required reading:
Optional research articles on concepts related to deception detection:
Optional reading about the potential for deception in superintelligent systems:
Additionally, we have previously hosted hackathons where related work was submitted. You can find a few examples here to find inspiration for where you might be able to take your project during the weekend. You can find examples on the Sprints page.
The schedule runs from 4 PM UTC Friday to 3 AM Monday. We start with an introductory talk and end the event during the following week with an awards ceremony. Join the public ICal here.
You will also find Explorer events such as collaborative brainstorming and team match-making before the hackathon begins on Discord and in the calendar.
WhiteBox is hosting the Manila node of the hackathon at Openspace Katipunan in 50 Esteban Abada St. on June 29-July 1. Join us!
WhiteBox is hosting the Manila node of the hackathon at Openspace Katipunan in 50 Esteban Abada St. on June 29-July 1. Join us!
We'll be hosting a jam site at the LISA offices in Shoreditch (25 Holywell Row, London EC2A 4XE). Hang out and collaborate with others interested in, and working on, AI safety!
We'll be hosting a jam site at the LISA offices in Shoreditch (25 Holywell Row, London EC2A 4XE). Hang out and collaborate with others interested in, and working on, AI safety!
We will be hosting the hackathon at Hereplein 4, 9711GA, Groningen. Join us!
We will be hosting the hackathon at Hereplein 4, 9711GA, Groningen. Join us!
For your submission, you are required to put together a PDF report. In the submission form below, you can see which fields are optional and which are not.
The judging criteria for your submission are:
👀 Join us for the keynote in an hour!
Now we're just one hour away from the kickoff where you'll hear an inspiring talk from Marius Hobbhahn in addition to Esben's introduction to the weekend's schedule, judging criteria, submission template, and more.
It will also be livestreamed for everyone across the world to have the keynote available during the weekend. After the hackathon, the recording of Marius’ talk will be published on our YouTube channelso you can revisit it.
We've had very inspiring sessions during this past week and we're excited to welcome everyone else to get exciting projects set up! Even before we're beginning, several interesting ideas have emerged on the #projects | teams forum in our community server where you can read others' projects and team up.
We look forward to see you in an hour! Here's the details:
Besides our online event where many of you will join, we also thank our locations across the globe joining us this time! London, Manila, and Groningen 🥳 The Apart team and multiple active AI safety researchers will be available on the server to answer all your questions in the #help-desk channel.
See you soon!
🤗 The Organizing Team
🥳 We're excited to welcome you for the Deception Detection Hackathon this coming weekend! Here's your short overview of the latest resources and topics to get you ready for the weekend.
For the keynote, we're delighted to present Marius Hobbhahn from Apollo Research who will be inspiring you with a talk on AI deception detection and their research.
It will all be happening in the Talks & Keynotes channel on Discord and on our YouTube livestream.
Check out this 5-minute video with the most relevant hackathon information:
To make your weekend productive and exciting, we recommend that you take these two steps after we begin:
For more information about the prizes and judging criteria for this weekend, jump to the hackathon website. TLDR; your projects will be judged by our brilliant panel and receive feedback. Next Thursday, we will have the Grand Finalé where the winning teams will present their projects and everyone is welcome to join. These are the prizes:
You will undoubtedly have questions that you need answered. Remember that the #❓help-desk channel is always available and that the organizers and mentors will be available there.
We really look forward to the weekend and we're excited to welcome you with Marius and Apollo on Friday on Discord and YouTube!
Remember that this is an exciting opportunity to connect with others and develop meaningful ideas in AI safety. We're all here to help each other succeed on this remarkable journey for AI safety.
We'll see you there, research hackers!
The Organizing Team