The hackathon is happening right now! Join by signing up below and be a part of our community server.

Daedalus AI Safety Hacks

No items found.

Signups

Entries

March 7, 2025 2:00 PM

March 10, 2025 2:00 AM

(UTC)

Organized by the AI safety initiatives at Georgia Tech and and the University of Michigan.

Hackathon starts in

Days

Hours

Minutes

Seconds

This event is finished. It occurred between

March 7, 2025

and

March 10, 2025

Shaping the Future of AI Safety

The development of artificial general intelligence (AGI) could be one of humanity's greatest achievements – or one of its greatest challenges. With only one AI safety researcher for every 250 working on advancing AI capabilities, we urgently need more talented minds focused on ensuring AI systems remain safe and aligned with human values.

The Deadus AI Safety Hackathon invites high school and undergraduate students to tackle crucial challenges in AI safety alongside leading researchers and practitioners. Whether you're interested in technical alignment or AI governance, this is your opportunity to make a real impact on one of the most pressing issues of our time.

Why Participate?

Learn directly from experts at organizations like Anthropic through keynote talks and panel discussions
Receive personalized guidance from active AI safety researchers through dedicated office hours
Explore concrete career paths in AI safety and build valuable connections
Compete for prizes across multiple tracks
Gain foundational knowledge and skills to continue contributing to AI safety

Tracks & Categories

Technical Track

Beginner (High School)

Model Evaluation & Behavior Analysis
Basic Interpretability Methods
Introduction to Robustness Testing

Advanced (Undergraduate)

Advanced Model Evaluations & Behavioral Studies
Mechanistic Interpretability
Adversarial Robustness & Red Teaming

Governance Track

Beginner (High School)

Basics of International AI Policy
Understanding Deployment Regulations
Data Privacy Fundamentals

Advanced (Undergraduate)

International AI Diplomacy & Coordination
Advanced Deployment & Application Regulation
Training Data Privacy & Copyright Frameworks

Event Details

Dates: March 7th 5PM - March 10th 5AM
Format: Fully Virtual
Team Size: 1-4 members
Eligibility: High school and undergraduate students
Note: Teams with any undergraduate members must participate in advanced tracks

How It Works

Form Your Team: Connect with like-minded participants and form teams of 1-4 members
Choose Your Track: Select either technical or governance focus (cannot submit to both)
Develop Your Project: Work with mentors and leverage provided resources
Submit & Present: Share your solution and compete for prizes

Resources & Support

Live workshops and training sessions
One-on-one mentorship opportunities
Access to relevant datasets and tools
Comprehensive documentation and guides
Active community support

Prize Distribution

Detailed prize breakdown will be announced at the event
Separate prize categories for beginner and advanced tracks
Special recognition for innovative solutions

Ready to help shape the future of AI safety? Sign up now and join a community of forward-thinking individuals working to ensure artificial intelligence remains beneficial for humanity.

Speakers & Collaborators

No items found.

ML foundations

Core reading

The goal of core readings is to explain what AI safety & alignment is, and why we care about it. This should cover fundamental arguments for and against AI safety and the current AI landscape, and not very technical. These readings will also help participants decide whether or not they’re interested in the topic.

AI safety in the news / expert opinions, optional

Technical tracks

Model evaluations

This field develops rigorous methods to assess AI models across multiple dimensions - from testing their reasoning abilities and checking for subtle biases, to evaluating their honesty and probing for potentially deceptive behaviors. A major focus is detecting signs of misalignment: Is the model optimizing for what we truly want, or has it learned to optimize for something else? Researchers create sophisticated test scenarios and adversarial challenges to expose potential weaknesses, failure modes, or optimization goals that might not be apparent in standard testing.
A key emphasis is evaluating models' alignment properties: Do they maintain goal-directed behavior that matches human intent? Can they reliably recognize and acknowledge their own limitations rather than making overconfident mistakes? How do they handle ambiguous or ethically challenging scenarios where simple reward functions might lead to unintended consequences? By systematically investigating these questions, model evaluation helps ensure AI systems are not just powerful, but also reliably aligned with human values and objectives. This includes testing for subtle forms of deception, reward hacking, and goal preservation under distribution shift - all critical challenges for building trustworthy AI.
Readings
-
- this article goes over a bunch of common problems encountered when thinking about alignment and how to judge whether a model is aligned— very digestible and beginner-level reading for introduction
-
by Apollo research
- Don’t need to read “current work in….” and “field building” sections, though the former might give ideas!
-

Resources
-
- From Apollo. This is a great intro resource.
-
- (more advanced) Apollo’s list of evals papers to be familiar with. This isn’t necessarily “core reading”, and is heavily opinionated based on Apollos research agenda, but most of the “Our Favorite Papers” items are worth being familiar with.
- More advanced technical guide with code: ARENA week 3:

Ideas/example projects
(A good rule-of-thumb is that most questions about how language models behave are probably under-explored)
-

- Evals for
(Owain Evans)
-

Mechanistic Interpretability

Mechanistic interpretability researchers aim to reverse engineer AI models, examining their internal components and operations - much like neuroscientists studying the human brain. Rather than treating AI systems as black boxes, they develop methods to map out how these systems process information and reach conclusions.
This work is crucial for AI safety - understanding how models work enables us to make them more reliable and controllable. When we know exactly how an AI system makes decisions, we can better predict failures, prevent unwanted behaviors, and ensure alignment with human values.
The field blends computer science, neuroscience, mathematics, and philosophy to understand how neural networks recognize patterns, what internal representations they build, and how distinct computational circuits within them work together.

Readings
-
- I think this article does the best job explaining some key terms (polysemantic neurons, SAEs, superposition) naturally
-
- this article establishes why interpretability is important and then goes into some examples and ways of thinking about how to interpret specific kinds of models. (also has a glossary w/ important definitions beginners might not know at the end of the article)
-
This article introduces the problem of interpretability and current work in an easy to understand way. It then explains new research interpreting the activation of features when specific words are prompted to an LLM.
Resources
- Pop-sci intro:

-

-

- Neel Nanda:

- Neel Nanda:

- Neel Nanda:

- More advanced:

-
- good in depth video about why interpretability is important and difficulties in making models interpretable as they get smarter (scalable oversight)
Ideas/example projects
- Making a CNN, training it, and demonstrating the application of different interpretability libraries like GRADCam.

Red-teaming / adversarial robustness

Adversarial robustness explores a fascinating vulnerability at the heart of modern AI systems: their susceptibility to carefully crafted inputs designed to fool them. While an AI might correctly identify thousands of normal images of cats, a few carefully placed pixels can trick it into seeing a toaster instead - a weakness that humans typically don't share.
This field studies how to make AI systems resilient against these "adversarial attacks" - minimal changes to inputs that cause maximal confusion in model outputs. The stakes extend far beyond academic curiosity. As AI systems are deployed in critical applications like medical diagnosis or autonomous vehicles, ensuring they remain reliable even when faced with manipulated inputs becomes crucial for safety.

Readings
-
- explains adversarial attacks and why it is a core problem in AI security. Gives high-level examples of adversarial attacks as well as defenses.
-
- less general, more advanced, though more current. Probably not worth reading unless interested. This got some buzz a few months ago.
-
Short reading about what red-teaming is about in general tech fields. Can give context to what it is in AI
Resources
-

Ideas/example projects

Governance tracks

International diplomacy

Readings
-
- From JHU Policy Hackathon, good overview on the space so far.
-
- Goes into more specifics on what could be regulated
-
– Delineates the UN’s first resolution on AI which was signed by 120 states including China
-
– overview of governance risks
Resources

Ideas/example projects

Deployment & application regulation

Readings
-
(only pages 272-279) Introduces theoretical frameworks to the “rules” that AI should follow.
-
quick introduction to AI regulation strategies
-
Covers basic needs for AI regulation and then does a basic review of several possible regulation strategies
-
introduction to the idea of compute governance and why it’s important
-
Good place to get some ideas
-
– overview of the EU AI act; useful to read source text.
-

Resources

Ideas/example projects

Training data privacy & copyright

Readings
-
(fairness reading, not data privacy, but idk where else to put) (pages 273-289)
-
Explains a lawsuit that got filed against GitHub after they allegedly trained CoPilot on user code without permission.
-
An app that protects artists against having their art used to train GenAI without their consent.
-
- its sibling program Nightshade actively poisons models that are trained on it, can be a really interesting conversation starter
Resources
-

-

Ideas/example projects

See the updated calendar and subscribe

The schedule runs from 4 PM UTC Friday to 3 AM Monday. We start with an introductory talk and end the event during the following week with an awards ceremony. Join the public ICal here.You will also find Explorer events, such as collaborative brainstorming and team match-making before the hackathon begins on Discord and in the calendar.

‍

📍 Registered jam sites

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

Thank you! Your submission has been received! Your event will show up on this page.

Oops! Something went wrong while submitting the form.

📣 Social media images and text snippets

No media added yet

No text snippets added yet

You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.

Oops! Something went wrong while submitting the form.

No projects submitted yet! Add your project information in the form. We usually see projects submitted quite close to the deadline.

Your info

Your info

Daedalus AI Safety Hacks

Shaping the Future of AI Safety

Why Participate?

Tracks & Categories

Technical Track

Governance Track

Event Details

How It Works

Resources & Support

Prize Distribution

Speakers & Collaborators

ML foundations

Core reading

Technical tracks

Model evaluations

Mechanistic Interpretability

Red-teaming / adversarial robustness

Ideas/example projects

Governance tracks

International diplomacy

Deployment & application regulation

Training data privacy & copyright

📍 Registered jam sites

🏠 Register a location

📣 Social media images and text snippets