November 18, 2024
–
Announcements
How impactful is donating to Apart Research?
Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.
November 14, 2024
–
Research
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Apart Research's newest paper looks at LLM-assisted benchmark analysis.
November 11, 2024
–
Research
Announcing Apart Lab Studio
Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.
October 31, 2024
–
Community
Esben on AGI, 'Sentware', and Confident optimism
Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.
October 30, 2024
–
Research
‘3cb’: The Catastrophic Cyber Capabilities Benchmark
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.
October 18, 2024
–
Research
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
October 9, 2024
–
Research
Esben on agent safety research
Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!
September 27, 2024
–
Research
Do models really internalize our preferences?
Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter? Because if an LLM’s behavior diverges from human feedback, unintended consequences may arise.
September 13, 2024
–
Events
Can startups be impactful in AI safety?
This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.
August 24, 2024
–
AI Security
Where we are on for-profit AI safety
Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.
Finding Deception in Language Models
This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.
Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)
A few months ago, Apart, in collaboration with METR, ran the Code Red Hackathon to engage talent across the world in impactful AI safety research. Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.
Results from the AI x Democracy Research Sprint
We ran a 3-day research sprint on AI governance, motivated by the need for demonstrations of the risks to democracy by AI, supporting AI governance work. Here we share the 4 winning projects but many of the other 21 entries were also incredibly interesting and we suggest you take a look.
The ultimate guide to AI safety research hackathons
Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend. Having organized and participated in several AI safety hackathons with Apart Research, here are some key tips to help you get the most out of your hackathon experience:
Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models. We invite researchers, cybersecurity professionals, and governance experts to join but it is open for everyone, and we will introduce starter code templates to help you kickstart your team's projects. Join at apartresearch.com/event/ai-democracy.
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research. Take part in the Code Red Hackathon, where you can earn money, connect with experts, and help create tasks to evaluate frontier AI systems.
How to organize a research hackathon
Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.
February 1, 2024
–
AI Security
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
January 23, 2024
–
Guides
Taking your next steps after a research hackathon
With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!
December 12, 2023
–
Community
Why organize a research hackathon?
There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community. The participants get an amazing practical research experience and can show the finished projects off to potential employers and colleagues, and it's a really fun way to spend a weekend.
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
February 22, 2023
–
Events
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
November 21, 2022
–
Events
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
November 17, 2022
–
Events
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.