The hackathon is happening right now! Join by signing up below and be a part of our community server.
Apart > Sprints

Women in AI Safety Hackathon

--
Signups
--
Entries
March 7, 2025 5:00 PM
 to
March 10, 2025 3:00 AM
 (UTC)
Hackathon starts in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign upSign up
This event is finished. It occurred between 
March 7, 2025
 and 
March 10, 2025

Shape the future of safe and ethical AI development! Whether you're a researcher, developer, policy enthusiast, or new to AI safety - join us for an empowering weekend of innovation and collaboration. No prior AI safety experience required. Together, we can build the foundational tools and frameworks needed for responsible AI development.

We're thrilled to partner with Women Who Do Data (W2D2), a member-led community dedicated to supporting diverse talent in AI technology development, to bring you this unique hackathon during International Women's Day weekend. Together, we're working to increase the presence of underrepresented groups in AI safety and technical AI development.

We're excited to announce that Lambda Labs will be providing $400 in computing credits to each participating team! This generous support will give you access to powerful cloud instances (including A100s) to help bring your ideas to life. Each track will have a $600 prize for the winning team, with a total of $1,800 in prizes across all three tracks!

The Women in AI Safety Hackathon brings together talented individuals to tackle crucial challenges in AI development and deployment. This event particularly encourages women and underrepresented groups to contribute their unique perspectives to critical areas of AI safety, including alignment, governance, security, and evaluation.

As AI systems become increasingly powerful and pervasive, diverse perspectives in their development and safety mechanisms are more crucial than ever. This hackathon provides a platform for participants to:

  • Collaborate with leading women researchers and practitioners in AI safety
  • Develop practical solutions to pressing AI safety challenges
  • Build lasting connections in the AI safety community
  • Receive mentorship from experienced professionals
  • Present ideas to industry experts

Challenge Tracks

Mechanistic Interpretability Track

We're excited to announce that Goodfire, a pioneering research lab in AI interpretability, will be sponsoring our Mechanistic Interpretability track! We're also thrilled to have Myra Deng, Founding PM at Goodfire, join us for a HackTalk on understanding and steering AI models.

1. Understanding AI Model Internals: Dive deep into the inner workings of large language models using state-of-the-art sparse autoencoder techniques. Learn to map and understand model behavior at a granular level, while developing tools to interpret and visualize neural features. This track focuses on creating innovative approaches to understand what's happening inside these complex systems.

2. Model Steering and Editing: Explore practical applications of interpretability by developing methods to modify and control model behavior through targeted feature interventions. Work on creating interpretable control mechanisms and safety-focused editing tools that can help shape model outputs while maintaining performance. This component bridges the gap between theoretical understanding and practical applications.

3. Feature Analysis and Visualization: Create tools and interfaces that make AI systems more transparent and understandable. Focus on building interactive visualization dashboards that help analyze feature activations and interactions, making complex neural networks more accessible to researchers and practitioners. Develop novel ways to present and interact with model internals

Participants will have access to:

- Goodfire's Ember API for model interpretation
- Compute credits for experiments

The winning team in this track will receive a $600 prize!

Public Education Track


We're excited to announce that BlueDot Impact, the world’s leading AI Safety education platform with a community of over 4,500 professionals, will sponsor our Public Education track!

Build an education platform feature to help newcomers to the field understand the risks of AI.

Ideas to get you started include:

  1. Games: Get creative and build an exciting new feature to demonstrate or solidify understanding of core AI Safety concepts for non-technical learners
  2. Matching platform: We are a global community with participants from over 100 different countries. Introduce a mentorship matching feature to connect mentors and mentees with similar interests from across our community
  3. Self-assessments: How can our course graduates self-test understanding of critical AI Safety concepts and demonstrate mastery?
  4. Journey mapping: It can be difficult for newcomers to see how they can fit in the larger world of AI Safety. Create a matching tool to navigate through AI safety domains, job recommendations, or BlueDot course offerings based on a user's background, skills, and interests

The winning team in this track will receive a $600 prize!

Social Sciences Track

This track invites participants to explore the intersection of Artificial Intelligence and the Social Sciences through two broad themes: (1) how AI "thinks" and (2) how it shapes human society.

The first theme, Machine Psychology, examines the systematic study of AI behaviour and cognition, drawing insights from psychology, behavioural science, and cognitive science. This theme investigates how AI models develop internal representations, exhibit emergent capabilities, display decision-making patterns, and adapt to different inputs. It explores both the interpretability of these systems and their behavioural tendencies, helping us understand how AI "thinks”.

The second theme, AI in Society, focuses on the human-AI dynamic, investigating AI’s broader social, political, and economic implications. This includes its impact on human interactions, institutional transformation across sectors like education and healthcare, and evolving social norms. The theme also addresses critical ethical considerations in AI development and deployment, examining questions of fairness, accountability, and governance as AI systems become more integrated into society.

Participants will contribute through short reports or research-based papers that can range from highly technical to completely non-technical, as we want to make this track accessible to all social scientists and foster interdisciplinary discussions on AI’s evolving role in our world.

The winning team in this track will receive a $600 prize!

Speakers & Collaborators

Natalia Pérez-Campanero Antolín

A research manager at Apart, Natalia has a PhD in Interdisciplinary Biosciences from Oxford and has run the Royal Society's Entrepreneur-in-Residence program.
Judge

Jasmine Wang

Jasmine is the control empirics team lead at UK AISI. Previously, she built and exited one of the first startups to use GPT-3, and worked at Partnership on AI and OpenAI.
Judge

Bessie O'Dell

Strategy & Delivery Advisor at UK AI Safety Institute, DPhil Oxford in AI & Psychiatry. Previously Visiting Fellow at GovAI, working on AI governance and openness
Social Science Track Judge

Myra Deng

Founding PM at Goodfire, Stanford MBA and MS CS graduate previously building modeling platforms at Two Sigma
Speaker

Lindsey Robertson

Data Science & Strategy Consultant with 15 years of experience in analytics and operations. Advocates for AI adoption in SMEs and diversity in tech, guiding businesses through data
Co-Organiser

Astha Puri

Senior Data Scientist at Fortune 6 healthcare company, leading AI initiatives and search portfolio. Former Oracle & Twilio engineer, now bridging AI, health, and healing technology
Co-Organiser

Grecia Castaldi

Tech professional with 10+ years in Digital Design and Information Systems Engineering. Leads tech initiatives supporting women in LATAM, passionate about UX Design.
Co-Organiser

ChengCheng Tan

Owner of Cheng2 Design and Senior Communications Specialist at FAR AI. Stanford CS alum and Google Women Techmakers Ambassador, focused on AI Safety.
Co-Organiser

Ziba Atak

AI Engineer with psychology and business background. Specializes in NLP and generative AI.
Co-Organiser

Archana Vaidheeswaran

Archana is responsible for organizing the Apart Sprints, research hackathons to solve the most important questions in AI safety.
Organizer

Angela Cao

Data Scientist at Memorial Hermann Health, Rice University MDS grad. Top 3% Kaggle expert focusing on predictive analytics, ML, and NLP. Active in disability advocacy.
Co-Organiser

Hannah Betts

Special Projects Lead at FAR.AI focusing on AI safety and education. Former Lead Advisor at NZ Ministry of Education, bringing expertise in curriculum design and science education.
Education Track Judge

Zainab Majid

Zainab works at the intersection of AI safety and cybersecurity, leveraging her expertise in incident response investigations to tackle AI security challenges.
Judge

Tarin Rickett

Product & Engineering Lead at BlueDot Impact, former LinkedIn Staff Engineer. CS and Brain Science grad from Rochester, passionate about educational tech and women in computing.
Speaker

Andreea Damien

Interdisciplinary Scientist with a socio-technical background and Visiting Fellow at the LSE, working at the intersection of natural and artificial systems.
Social Science Track Organiser and Judge

Anna Leshinskaya

Cognitive Scientist at UC Irvine studying human cognition and AI alignment. Harvard PhD, affiliated with AI Objectives Institute, researching cognitive and moral alignment in LLMs
Social Science Track Judge

Cecilia Elena Till

Associate Director at Cooperative AI Foundation supporting research to improve AI cooperation for collective benefit. Former program manager focused on theory of change.
Social science Track Judge

Li-Lian

Li-Lian is a product manager at BlueDot Impact. She leads their 5-day intro to AI safety courses and built their custom learning platform.
Education Track Judge

AI Safety Essential Reading

Technical Resources

Resources with project suggestions

Resources for the Mechanistic Interpretability track

  1. Tutorial: Visualizing AI Model Internals: Watch this video to understand how to use Goodfire's tools to map and visualize AI model behavior.
  2. Check out the Jupyter Notebook Quickstart: . In this quickstart, you'll learn how to:
    1. Sample from a language model (in this case, Llama 3 8B)
    2. Search for exciting features and intervene in them to steer the model
    3. Find features by contrastive search
    4. Save and load Llama models with steering applied
  3. Feature Steering Blog Post: Explore how Goodfire's feature steering technology provides granular control over AI models, moving beyond traditional prompting and fine-tuning. Learn about practical applications in model customization, jailbreak prevention, and persona management.
  4. Goodfire Ember- Scaling Interpretability for Frontier Model Alignment: Technical deep-dive into Ember, Goodfire's interpretability API supporting Llama 3.3 70B. Covers sparse autoencoders for feature extraction, model steering capabilities, and real-world applications in improving model safety and reliability.
⚠️ If you are going to follow the Mechanistic Interpretability track please fill in this form before Friday 16:00 PST ⚠️

Longer in-depth reading

Resources for Public Education Track

1. Intro to Transformative AI curriculum: The risks and opportunities of advanced AI are evolving at unprecedented speed — and so is the need for capable individuals to shape its trajectory. This course is for those who want to rapidly develop their understanding of transformative AI and its impact on humanity. Through expert-facilitated discussions and carefully curated materials, you’ll explore the technical foundations of AI, examine potential futures, and debate key ideas alongside others passionate about ensuring AI benefits humanity. By the end, you’ll have both the knowledge and network to take meaningful steps toward contributing to AI safety


2. BlueDot's science of learning by Li-Lian Ang: This article gives an overview of how the BlueDot course is structured to empower participants to:

  1. Understand the problem and solutions in positively developing future AI systems and building a pandemic-proof world.
  2. Build frameworks to evaluate interventions critically.
  3. Create and implement impactful solutions using their knowledge and skills

To achieve these goals, courses come in two phases:

  • Learning Phase: participants build a foundational understanding of the field.
  • Project Phase: participants apply what they've learned to take meaningful action.

See the updated calendar and subscribe

The schedule runs from 4 PM UTC Friday to 3 AM Monday. We start with an introductory talk and end the event during the following week with an awards ceremony. Join the public ICal here. You will also find Explorer events, such as collaborative brainstorming and team match-making, before the hackathon begins on Discord and in the calendar.

📍 Registered jam sites

Beside the remote and virtual participation, our amazing organizers also host local hackathon locations where you can meet up in-person and connect with others in your area.

Women in AI Safety Hackathon - Dubai

If you're coming by Taxi or Metro, Enter the Boulevard area of Emirates Towers. You'll find CodersHQ right opposite Creators Hub on the Ground Floor. If you're driving, there is free parking here: https://maps.app.goo.gl/pp1nqVbrN2LiQbk48

Women in AI Safety Hackathon - Dubai

AISIG - Women in AI Safety Research Hackathon

Join us for the Women in AI Safety Research Hackathon in Hereplein 4, 9711GA, Groningen!

AISIG - Women in AI Safety Research Hackathon

Women in AI Safety Research Hackathon

Join us at the EA Hotel for the Women in AI Safety Research Hackathon. Free accommodation, food and co-working stations provided! We're located at: 36 York Street, Blackpool, FY15AQ. Please register through our Luma event page so we know you're coming!

Women in AI Safety Research Hackathon

Women in AI Safety Hackathon - 42AI PARIS jam site

42AI is a student association dedicated to foster learning and discussion in the field of AI.

Women in AI Safety Hackathon - 42AI PARIS jam site

Women in AI Safety Hackathon

25 Holywell Row, London EC2A 4XE

LISA London

AI Safety Hackathon (by WAI and EA Warwick)

Join Warwick AI and Effective Altruism to the joint hackathon Weekend. We will be on the 8th and 9th March in room FAB 2.48 on the main Campus of the Universtiy of Warwick. Anyone can join! Hope to see you there!

AI Safety Hackathon (by WAI and EA Warwick)

🏠 Register a location

The in-person events for the Apart Sprints are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research, student, and engineering community. Read more about organizing.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

📣 Social media images and text snippets

No media added yet
No text snippets added yet

Overview

Each team should submit a research paper that documents their project and contributions to AI safety. Your submission should demonstrate both technical competence and thoughtful consideration of safety implications within your chosen track.

Use this template for your submission [Required]

Template in Overleaf/LaTeX

Submission Requirements

Your submission package must include the following:

  1. Research paper (PDF format, max 6 pages) following the provided template
  2. Project code repository (if applicable)
  3. Demo materials or visualizations
  4. Brief project presentation (optional max 5 slides)
  5. Short video demonstration (optional, max 3 minutes)

Evaluation Criteria

Projects will be evaluated across three main dimensions, with specific considerations for each track:

Innovation & Literature Foundation

Mechanistic Interpretability Track

  • Novel approaches to understanding model internals using sparse autoencoders
  • Integration with existing interpretability research
  • Creative applications of Goodfire's Ember API
  • Innovative visualization or analysis techniques
  • Contribution to understanding model behavior

Public Education Track

  • Novel approaches to teaching AI safety concepts
  • Integration with existing educational frameworks
  • Creative use of interactive learning tools
  • Innovation in assessment methods
  • Connection to established learning theories

Social Sciences Track

  • Novel insights into AI cognition and societal impact
  • Integration of multiple social science disciplines
  • Innovative research methodologies
  • Creative approaches to studying human-AI interaction
  • Connection to established social science theories

AI Safety Impact

Mechanistic Interpretability Track

  • Clear threat model for model behavior
  • Potential for detecting or preventing harmful capabilities
  • Scalability to larger models
  • Robustness of interpretation methods
  • Practical applicability for safety research

Public Education Track

  • Impact on understanding AI risks
  • Effectiveness in conveying safety concepts
  • Scalability of educational approach
  • Potential for behavior change
  • Long-term learning outcomes

Social Sciences Track

  • Understanding of AI safety implications
  • Impact on policy and governance
  • Societal risk assessment
  • Ethical framework development
  • Cross-cultural considerations

Technical Quality & Documentation

Mechanistic Interpretability Track

  • Code quality and reproducibility
  • Technical depth of analysis
  • Documentation of methods
  • Visualization clarity
  • Experimental rigor

Public Education Track

  • Platform/tool implementation quality
  • User experience design
  • Documentation clarity
  • Assessment methodology
  • Resource accessibility

Social Sciences Track

  • Research methodology rigor
  • Data collection methods
  • Analysis framework clarity
  • Documentation of findings
  • Presentation of results

Frequently Asked Questions

General Questions

Q: What is the submission deadline?A: All submissions must be received by March 10, 2025, 3:00 AM UTC.

Q: How many team members are allowed?A: Teams can have 4-5 members. Individual submissions are possible but team collaboration is encouraged.

Q: Can we submit to multiple tracks? A: Teams should focus on one primary track but can incorporate elements from other tracks if relevant.

Track-Specific Questions

Mechanistic Interpretability Track

Q: Do we need to use Goodfire's Ember API?A: Yes, projects in this track should utilize the Ember API for model analysis.

Q: What compute resources are available?A: Each team receives $400 in Lambda Labs credits for compute resources.

Public Education Track

Q: Can we build on existing educational platforms?A: Yes, you can integrate with existing platforms while clearly documenting your novel contributions.

Q: How should we measure educational impact?A: Include both quantitative metrics and qualitative assessments of learning outcomes.

Social Sciences Track

Q: What research methodologies are acceptable? A: Both qualitative and quantitative methods are welcome, with clear documentation of methodology.

Q: How should we handle data collection?A: Follow standard social science research ethics and data protection guidelines.

Submission Process

  1. Prepare all required materials following the templates provided
  2. Submit through the hackathon platform
  3. Include all team member information
  4. Ensure all links are accessible
  5. Complete the submission form with project details

Support

For questions or technical support:

Remember to review all evaluation criteria carefully and ensure your submission addresses the key aspects of your chosen track.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.
Oops! Something went wrong while submitting the form.
BlueDot Impact Connect: A Comprehensive AI Safety Community Platform
Track: Public Education The AI safety field faces a critical challenge: while formal education resources are growing, personalized guidance and community connections remain scarce, especially for newcomers from diverse backgrounds. We propose BlueDot Impact Connect, a comprehensive AI Safety Community Platform designed to address this gap by creating a structured environment for knowledge transfer between experienced AI safety professionals and aspiring contributors, while fostering a vibrant community ecosystem. The platform will employ a sophisticated matching algorithm for mentorship that considers domain-specific expertise areas, career trajectories, and mentorship styles to create meaningful connections. Our solution features detailed AI safety-specific profiles, showcasing research publications, technical skills, specialized course completions, and research trajectories to facilitate optimal mentor-mentee pairings. The integrated community hub enables members to join specialized groups, participate in discussions, attend events, share resources, and connect with active members across the field. By implementing this platform with BlueDot Impact's community of 4,500+ professionals across 100+ countries, we anticipate significant improvements in mentee career trajectory clarity, research direction refinement, and community integration. We propose that by formalizing the mentorship process and creating robust community spaces, all accessible globally, this platform will help democratize access to AI safety expertise while creating a pipeline for expanding the field's talent pool—a crucial factor in addressing the complex challenge of catastrophic AI risk mitigation.
Elise Racine, Aliya Koishina, Aishwarya Gurung, Tatenda Mawema, Haihao Liu
March 10, 2025
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Beyond Statistical Parrots: Unveiling Cognitive Similarities and Exploring AI Psychology through Human-AI Interaction
Recent critiques labeling large language models as mere "statistical parrots" overlook essential parallels between machine computation and human cognition. This work revisits the notion by contrasting human decision-making—rooted in both rapid, intuitive judgments and deliberate, probabilistic reasoning (System 1 and 2) —with the token-based operations of contemporary AI. Another important consideration is that both human and machine systems operate under constraints of bounded rationality. The paper also emphasizes that understanding AI behavior isn’t solely about its internal mechanisms but also requires an examination of the evolving dynamics of Human-AI interaction. Personalization is a key factor in this evolution, as it actively shapes the interaction landscape by tailoring responses and experiences to individual users, which functions as a double-edged sword. On one hand, it introduces risks, such as over-trust and inadvertent bias amplification, especially when users begin to ascribe human-like qualities to AI systems. On the other hand, it drives improvements in system responsiveness and perceived relevance by adapting to unique user profiles, which is highly important in AI alignment, as there is no common ground truth and alignment should be culturally situated. Ultimately, this interdisciplinary approach challenges simplistic narratives about AI cognition and offers a more nuanced understanding of its capabilities.
Aisulu Zhussupbayeva
March 10, 2025
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Detecting Malicious AI Agents Through Simulated Interactions
This research investigates malicious AI Assistants’ manipulative traits and whether the behaviours of malicious AI Assistants can be detected when interacting with human-like simulated users in various decision-making contexts. We also examine how interaction depth and ability of planning influence malicious AI Assistants’ manipulative strategies and effectiveness. Using a controlled experimental design, we simulate interactions between AI Assistants (both benign and deliberately malicious) and users across eight decision-making scenarios of varying complexity and stakes. Our methodology employs two state-of-the-art language models to generate interaction data and implements Intent-Aware Prompting (IAP) to detect malicious AI Assistants. The findings reveal that malicious AI Assistants employ domain-specific persona-tailored manipulation strategies, exploiting simulated users’ vulnerabilities and emotional triggers. In particular, simulated users demonstrate resistance to manipulation initially, but become increasingly vulnerable to malicious AI Assistants as the depth of the interaction increases, highlighting the significant risks associated with extended engagement with potentially manipulative systems. IAP detection methods achieve high precision with zero false positives but struggle to detect many malicious AI Assistants, resulting in high false negative rates. These findings underscore critical risks in human-AI interactions and highlight the need for robust, context-sensitive safeguards against manipulative AI behaviour in increasingly autonomous decision-support systems.
Yulu Pi, Anna Becker, Ella Bettison
March 10, 2025
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
BUGgy: Supporting AI Safety Education through Gamified Learning
As Artificial Intelligence (AI) development continues to proliferate, educating the wider public on AI Safety and the risks and limitations of AI increasingly gains importance. AI Safety Initiatives are being established across the world with the aim of facilitating discussion-based courses on AI Safety. However, these initiatives are located rather sparsely around the world, and not everyone has access to a group to join for the course. Online versions of such courses are selective and have limited spots, which may be an obstacle for some to join. Moreover, efforts to improve engagement and memory consolidation would be a notable addition to the course through Game-Based Learning (GBL), which has research supporting its potential in improving learning outcomes for users. Therefore, we propose a supplementary tool for BlueDot's AI Safety courses, that implements GBL to practice course content, as well as open-ended reflection questions. It was designed with principles from cognitive psychology and interface design, as well as theories for question formulation, addressing different levels of comprehension. To evaluate our prototype, we conducted user testing with cognitive walk-throughs and a questionnaire addressing different aspects of our design choices. Overall, results show that the tool is a promising way to supplement discussion-based courses in a creative and accessible way, and can be extended to other courses of similar structure. It shows potential for AI Safety courses to reach a wider audience with the effect of more informed and safe usage of AI, as well as inspiring further research into educational tools for AI Safety education.
Sophie Sananikone, Xenia Demetriou, Mariam Ibrahim, Nienke Posthumus
March 10, 2025
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆