Apart news
The Latest News, Research & Events
Sign up to stay updated on the latest news, research, and events
All
Community
Research
Newsletter
Spotlight
Community
Mar 18, 2025
Mapping AI Safety Research: An Open-Source Knowledge Graph
A tool to map the sprawling landscape of AI alignment research
Read More
Community
Mar 14, 2025
Apart News: San Francisco Edition
This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.
Read More
Community
Feb 21, 2025
Apart News: ICLR Awards & Women in AI Safety
This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.
Read More
Research
Feb 18, 2025
Uncovering Model Manipulation with DarkBench
Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.
Read More
Research
Feb 13, 2025
Studio Progress Report
We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.
Read More
Newsletter
Feb 7, 2025
Apart News: Esben at IASEAI & Studio Progress Report
This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.
Read More
Newsletter
Jan 31, 2025
Apart News: Paris AI Summit & Catching Hackers
This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.
Read More
Community
Jan 28, 2025
AI Safety Entrepreneurship Hackathon Round-Up
In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.
Read More
Newsletter
Jan 24, 2025
Apart News: AI Entrepreneurship & New Research
This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.
Read More
Newsletter
Jan 17, 2025
Apart News: Exclusive Interview with Interpretability Insider
Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 17, 2025
Behind the Features: Goodfire's Interpretability Tools in Action
Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 16, 2025
Promising results from Latent Adversarial Training
Apart Research's newest research achieves promising results from Latent Adversarial Training.
Read More
Newsletter
Jan 10, 2025
Apart News: new LAT research just dropped
In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.
Read More
Community
Jan 9, 2025
Inside the first AI Policy Hackathon at Johns Hopkins
Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.
Read More
Community
Jan 1, 2025
Apart in 2025
2024 was the biggest and most impactful year of Apart Research so far.
Read More
Research
Dec 31, 2024
AI Hackers in the Wild: LLM Agent Honeypot
This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.
Read More
Newsletter
Dec 20, 2024
Apart News: Hackathons in 2025 PREVIEW
In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.
Read More
Newsletter
Dec 17, 2024
Sparse Autoencoder Hackathon
Our Hackathon round-up showcases our global sprints community.
Read More
Research
Dec 14, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Apart Research's newest paper looks at LLM-assisted benchmark analysis.
Read More
Newsletter
Dec 13, 2024
Apart News: our research at NeurIPS
In this week's Apart News we are at NeurIPS in Canada.
Read More
Newsletter
Dec 6, 2024
Apart News: *NEW VIDEO* Jacob Haimes on working at Apart
In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.
Read More
Newsletter
Dec 3, 2024
Apart News: 2024 was our biggest year yet
In this week's Apart News we invite you to revisit Apart Research's incredible 2024 with us.
Read More
Newsletter
Nov 29, 2024
Apart News: how impactful are we?
In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.
Read More
Newsletter
Nov 22, 2024
Apart News: NEW Papers, Elections & Goodfire
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Nov 22, 2024
Testing LLMs' ability to find security flaws in Cryptographic Protocols
Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.
Read More
Community
Nov 18, 2024
How impactful is donating to Apart Research?
Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.
Read More
Newsletter
Nov 15, 2024
Apart News: Announcing Apart Lab Studio
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Nov 11, 2024
Announcing Apart Lab Studio
Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.
Read More
Newsletter
Nov 8, 2024
Apart News: Ale, Cash Prizes & the UK’s AISI
Apart News is our newsletter to keep you up-to-date.
Read More
Spotlights
Nov 5, 2024
Researcher Spotlight: Alexandra Abbas
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Nov 1, 2024
Apart News: Esben, Winning Sprints & ‘3cb’
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 31, 2024
Esben on AGI, 'Sentware', and Confident optimism
Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.
Read More
Research
Oct 30, 2024
‘3cb’: The Catastrophic Cyber Capabilities Benchmark
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.
Read More
Newsletter
Oct 28, 2024
AI Policy Hackathon in Washington D.C.
Our Hackathon round-up showcases our global 'sprints' community.
Read More
Newsletter
Oct 25, 2024
Apart News: Finn, Cyber Offense & Johns Hopkins
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Oct 18, 2024
Apart News: Clement, Benchmarks & D.C.
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 18, 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
Read More
Spotlights
Oct 18, 2024
Researcher Spotlight: Clement Neo
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Spotlights
Oct 15, 2024
Researcher Spotlight: Akash Kundu
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Oct 11, 2024
Apart News: Researcher Spotlight, New Team Member & Bangalore
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 9, 2024
Esben on agent safety research
Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!
Read More
Newsletter
Oct 4, 2024
Apart News: Agents, Submissions & Spain
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 27, 2024
Apart News: New Research, NeurIPS Papers & Team Offsite
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Sep 27, 2024
Do models really internalize our preferences?
Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?
Read More
Newsletter
Sep 20, 2024
Apart News: o1, Awards & Singapore
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 13, 2024
Apart News: AI Startups, India & Concordia
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Sep 13, 2024
Can startups be impactful in AI safety?
This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.
Read More
Community
Aug 24, 2024
Where we are on for-profit AI safety
Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.
Read More
Community
Jul 23, 2024
Finding Deception in Language Models
This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.
Read More
Community
Jun 20, 2024
Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)
Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.
Read More
Community
May 17, 2024
The ultimate guide to AI safety research hackathons
Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.
Read More
Community
Apr 19, 2024
Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.
Read More
Community
Mar 18, 2024
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.
Read More
Community
Mar 1, 2024
How to organize a research hackathon
Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.
Read More
Spotlights
Feb 12, 2024
Researcher Spotlight: Jacob Haimes
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Community
Feb 1, 2024
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
Read More
Community
Jan 23, 2024
Taking your next steps after a research hackathon
With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!
Read More
Community
Dec 12, 2023
Why organize a research hackathon?
There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.
Read More
Community
Jul 13, 2023
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
Read More
Research
Feb 22, 2023
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Read More
Research
Jan 2, 2023
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
Read More
Research
Nov 21, 2022
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
Read More
Research
Nov 17, 2022
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.
Read More
All
Community
Research
Newsletter
Spotlight
Community
Mar 18, 2025
Mapping AI Safety Research: An Open-Source Knowledge Graph
A tool to map the sprawling landscape of AI alignment research
Read More
Community
Mar 14, 2025
Apart News: San Francisco Edition
This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.
Read More
Community
Feb 21, 2025
Apart News: ICLR Awards & Women in AI Safety
This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.
Read More
Research
Feb 18, 2025
Uncovering Model Manipulation with DarkBench
Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.
Read More
Research
Feb 13, 2025
Studio Progress Report
We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.
Read More
Newsletter
Feb 7, 2025
Apart News: Esben at IASEAI & Studio Progress Report
This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.
Read More
Newsletter
Jan 31, 2025
Apart News: Paris AI Summit & Catching Hackers
This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.
Read More
Community
Jan 28, 2025
AI Safety Entrepreneurship Hackathon Round-Up
In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.
Read More
Newsletter
Jan 24, 2025
Apart News: AI Entrepreneurship & New Research
This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.
Read More
Newsletter
Jan 17, 2025
Apart News: Exclusive Interview with Interpretability Insider
Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 17, 2025
Behind the Features: Goodfire's Interpretability Tools in Action
Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 16, 2025
Promising results from Latent Adversarial Training
Apart Research's newest research achieves promising results from Latent Adversarial Training.
Read More
Newsletter
Jan 10, 2025
Apart News: new LAT research just dropped
In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.
Read More
Community
Jan 9, 2025
Inside the first AI Policy Hackathon at Johns Hopkins
Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.
Read More
Community
Jan 1, 2025
Apart in 2025
2024 was the biggest and most impactful year of Apart Research so far.
Read More
Research
Dec 31, 2024
AI Hackers in the Wild: LLM Agent Honeypot
This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.
Read More
Newsletter
Dec 20, 2024
Apart News: Hackathons in 2025 PREVIEW
In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.
Read More
Newsletter
Dec 17, 2024
Sparse Autoencoder Hackathon
Our Hackathon round-up showcases our global sprints community.
Read More
Research
Dec 14, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Apart Research's newest paper looks at LLM-assisted benchmark analysis.
Read More
Newsletter
Dec 13, 2024
Apart News: our research at NeurIPS
In this week's Apart News we are at NeurIPS in Canada.
Read More
Newsletter
Dec 6, 2024
Apart News: *NEW VIDEO* Jacob Haimes on working at Apart
In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.
Read More
Newsletter
Dec 3, 2024
Apart News: 2024 was our biggest year yet
In this week's Apart News we invite you to revisit Apart Research's incredible 2024 with us.
Read More
Newsletter
Nov 29, 2024
Apart News: how impactful are we?
In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.
Read More
Newsletter
Nov 22, 2024
Apart News: NEW Papers, Elections & Goodfire
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Nov 22, 2024
Testing LLMs' ability to find security flaws in Cryptographic Protocols
Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.
Read More
Community
Nov 18, 2024
How impactful is donating to Apart Research?
Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.
Read More
Newsletter
Nov 15, 2024
Apart News: Announcing Apart Lab Studio
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Nov 11, 2024
Announcing Apart Lab Studio
Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.
Read More
Newsletter
Nov 8, 2024
Apart News: Ale, Cash Prizes & the UK’s AISI
Apart News is our newsletter to keep you up-to-date.
Read More
Spotlights
Nov 5, 2024
Researcher Spotlight: Alexandra Abbas
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Nov 1, 2024
Apart News: Esben, Winning Sprints & ‘3cb’
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 31, 2024
Esben on AGI, 'Sentware', and Confident optimism
Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.
Read More
Research
Oct 30, 2024
‘3cb’: The Catastrophic Cyber Capabilities Benchmark
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.
Read More
Newsletter
Oct 28, 2024
AI Policy Hackathon in Washington D.C.
Our Hackathon round-up showcases our global 'sprints' community.
Read More
Newsletter
Oct 25, 2024
Apart News: Finn, Cyber Offense & Johns Hopkins
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Oct 18, 2024
Apart News: Clement, Benchmarks & D.C.
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 18, 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
Read More
Spotlights
Oct 18, 2024
Researcher Spotlight: Clement Neo
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Spotlights
Oct 15, 2024
Researcher Spotlight: Akash Kundu
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Oct 11, 2024
Apart News: Researcher Spotlight, New Team Member & Bangalore
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 9, 2024
Esben on agent safety research
Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!
Read More
Newsletter
Oct 4, 2024
Apart News: Agents, Submissions & Spain
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 27, 2024
Apart News: New Research, NeurIPS Papers & Team Offsite
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Sep 27, 2024
Do models really internalize our preferences?
Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?
Read More
Newsletter
Sep 20, 2024
Apart News: o1, Awards & Singapore
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 13, 2024
Apart News: AI Startups, India & Concordia
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Sep 13, 2024
Can startups be impactful in AI safety?
This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.
Read More
Community
Aug 24, 2024
Where we are on for-profit AI safety
Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.
Read More
Community
Jul 23, 2024
Finding Deception in Language Models
This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.
Read More
Community
Jun 20, 2024
Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)
Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.
Read More
Community
May 17, 2024
The ultimate guide to AI safety research hackathons
Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.
Read More
Community
Apr 19, 2024
Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.
Read More
Community
Mar 18, 2024
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.
Read More
Community
Mar 1, 2024
How to organize a research hackathon
Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.
Read More
Spotlights
Feb 12, 2024
Researcher Spotlight: Jacob Haimes
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Community
Feb 1, 2024
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
Read More
Community
Jan 23, 2024
Taking your next steps after a research hackathon
With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!
Read More
Community
Dec 12, 2023
Why organize a research hackathon?
There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.
Read More
Community
Jul 13, 2023
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
Read More
Research
Feb 22, 2023
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Read More
Research
Jan 2, 2023
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
Read More
Research
Nov 21, 2022
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
Read More
Research
Nov 17, 2022
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.
Read More
All
Community
Research
Newsletter
Spotlight
Community
Mar 18, 2025
Mapping AI Safety Research: An Open-Source Knowledge Graph
A tool to map the sprawling landscape of AI alignment research
Read More
Community
Mar 14, 2025
Apart News: San Francisco Edition
This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.
Read More
Community
Feb 21, 2025
Apart News: ICLR Awards & Women in AI Safety
This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.
Read More
Research
Feb 18, 2025
Uncovering Model Manipulation with DarkBench
Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.
Read More
Research
Feb 13, 2025
Studio Progress Report
We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.
Read More
Newsletter
Feb 7, 2025
Apart News: Esben at IASEAI & Studio Progress Report
This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.
Read More
Newsletter
Jan 31, 2025
Apart News: Paris AI Summit & Catching Hackers
This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.
Read More
Community
Jan 28, 2025
AI Safety Entrepreneurship Hackathon Round-Up
In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.
Read More
Newsletter
Jan 24, 2025
Apart News: AI Entrepreneurship & New Research
This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.
Read More
Newsletter
Jan 17, 2025
Apart News: Exclusive Interview with Interpretability Insider
Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 17, 2025
Behind the Features: Goodfire's Interpretability Tools in Action
Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 16, 2025
Promising results from Latent Adversarial Training
Apart Research's newest research achieves promising results from Latent Adversarial Training.
Read More
Newsletter
Jan 10, 2025
Apart News: new LAT research just dropped
In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.
Read More
Community
Jan 9, 2025
Inside the first AI Policy Hackathon at Johns Hopkins
Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.
Read More
Community
Jan 1, 2025
Apart in 2025
2024 was the biggest and most impactful year of Apart Research so far.
Read More
Research
Dec 31, 2024
AI Hackers in the Wild: LLM Agent Honeypot
This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.
Read More
Newsletter
Dec 20, 2024
Apart News: Hackathons in 2025 PREVIEW
In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.
Read More
Newsletter
Dec 17, 2024
Sparse Autoencoder Hackathon
Our Hackathon round-up showcases our global sprints community.
Read More
Research
Dec 14, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Apart Research's newest paper looks at LLM-assisted benchmark analysis.
Read More
Newsletter
Dec 13, 2024
Apart News: our research at NeurIPS
In this week's Apart News we are at NeurIPS in Canada.
Read More
Newsletter
Dec 6, 2024
Apart News: *NEW VIDEO* Jacob Haimes on working at Apart
In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.
Read More
Newsletter
Dec 3, 2024
Apart News: 2024 was our biggest year yet
In this week's Apart News we invite you to revisit Apart Research's incredible 2024 with us.
Read More
Newsletter
Nov 29, 2024
Apart News: how impactful are we?
In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.
Read More
Newsletter
Nov 22, 2024
Apart News: NEW Papers, Elections & Goodfire
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Nov 22, 2024
Testing LLMs' ability to find security flaws in Cryptographic Protocols
Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.
Read More
Community
Nov 18, 2024
How impactful is donating to Apart Research?
Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.
Read More
Newsletter
Nov 15, 2024
Apart News: Announcing Apart Lab Studio
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Nov 11, 2024
Announcing Apart Lab Studio
Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.
Read More
Newsletter
Nov 8, 2024
Apart News: Ale, Cash Prizes & the UK’s AISI
Apart News is our newsletter to keep you up-to-date.
Read More
Spotlights
Nov 5, 2024
Researcher Spotlight: Alexandra Abbas
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Nov 1, 2024
Apart News: Esben, Winning Sprints & ‘3cb’
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 31, 2024
Esben on AGI, 'Sentware', and Confident optimism
Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.
Read More
Research
Oct 30, 2024
‘3cb’: The Catastrophic Cyber Capabilities Benchmark
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.
Read More
Newsletter
Oct 28, 2024
AI Policy Hackathon in Washington D.C.
Our Hackathon round-up showcases our global 'sprints' community.
Read More
Newsletter
Oct 25, 2024
Apart News: Finn, Cyber Offense & Johns Hopkins
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Oct 18, 2024
Apart News: Clement, Benchmarks & D.C.
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 18, 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
Read More
Spotlights
Oct 18, 2024
Researcher Spotlight: Clement Neo
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Spotlights
Oct 15, 2024
Researcher Spotlight: Akash Kundu
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Oct 11, 2024
Apart News: Researcher Spotlight, New Team Member & Bangalore
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 9, 2024
Esben on agent safety research
Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!
Read More
Newsletter
Oct 4, 2024
Apart News: Agents, Submissions & Spain
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 27, 2024
Apart News: New Research, NeurIPS Papers & Team Offsite
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Sep 27, 2024
Do models really internalize our preferences?
Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?
Read More
Newsletter
Sep 20, 2024
Apart News: o1, Awards & Singapore
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 13, 2024
Apart News: AI Startups, India & Concordia
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Sep 13, 2024
Can startups be impactful in AI safety?
This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.
Read More
Community
Aug 24, 2024
Where we are on for-profit AI safety
Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.
Read More
Community
Jul 23, 2024
Finding Deception in Language Models
This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.
Read More
Community
Jun 20, 2024
Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)
Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.
Read More
Community
May 17, 2024
The ultimate guide to AI safety research hackathons
Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.
Read More
Community
Apr 19, 2024
Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.
Read More
Community
Mar 18, 2024
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.
Read More
Community
Mar 1, 2024
How to organize a research hackathon
Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.
Read More
Spotlights
Feb 12, 2024
Researcher Spotlight: Jacob Haimes
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Community
Feb 1, 2024
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
Read More
Community
Jan 23, 2024
Taking your next steps after a research hackathon
With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!
Read More
Community
Dec 12, 2023
Why organize a research hackathon?
There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.
Read More
Community
Jul 13, 2023
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
Read More
Research
Feb 22, 2023
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Read More
Research
Jan 2, 2023
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
Read More
Research
Nov 21, 2022
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
Read More
Research
Nov 17, 2022
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.
Read More
All
Community
Research
Newsletter
Spotlight
Community
Mar 18, 2025
Mapping AI Safety Research: An Open-Source Knowledge Graph
A tool to map the sprawling landscape of AI alignment research
Read More
Community
Mar 14, 2025
Apart News: San Francisco Edition
This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.
Read More
Community
Feb 21, 2025
Apart News: ICLR Awards & Women in AI Safety
This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.
Read More
Research
Feb 18, 2025
Uncovering Model Manipulation with DarkBench
Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.
Read More
Research
Feb 13, 2025
Studio Progress Report
We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.
Read More
Newsletter
Feb 7, 2025
Apart News: Esben at IASEAI & Studio Progress Report
This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.
Read More
Newsletter
Jan 31, 2025
Apart News: Paris AI Summit & Catching Hackers
This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.
Read More
Community
Jan 28, 2025
AI Safety Entrepreneurship Hackathon Round-Up
In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.
Read More
Newsletter
Jan 24, 2025
Apart News: AI Entrepreneurship & New Research
This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.
Read More
Newsletter
Jan 17, 2025
Apart News: Exclusive Interview with Interpretability Insider
Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 17, 2025
Behind the Features: Goodfire's Interpretability Tools in Action
Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.
Read More
Research
Jan 16, 2025
Promising results from Latent Adversarial Training
Apart Research's newest research achieves promising results from Latent Adversarial Training.
Read More
Newsletter
Jan 10, 2025
Apart News: new LAT research just dropped
In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.
Read More
Community
Jan 9, 2025
Inside the first AI Policy Hackathon at Johns Hopkins
Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.
Read More
Community
Jan 1, 2025
Apart in 2025
2024 was the biggest and most impactful year of Apart Research so far.
Read More
Research
Dec 31, 2024
AI Hackers in the Wild: LLM Agent Honeypot
This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.
Read More
Newsletter
Dec 20, 2024
Apart News: Hackathons in 2025 PREVIEW
In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.
Read More
Newsletter
Dec 17, 2024
Sparse Autoencoder Hackathon
Our Hackathon round-up showcases our global sprints community.
Read More
Research
Dec 14, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Apart Research's newest paper looks at LLM-assisted benchmark analysis.
Read More
Newsletter
Dec 13, 2024
Apart News: our research at NeurIPS
In this week's Apart News we are at NeurIPS in Canada.
Read More
Newsletter
Dec 6, 2024
Apart News: *NEW VIDEO* Jacob Haimes on working at Apart
In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.
Read More
Newsletter
Dec 3, 2024
Apart News: 2024 was our biggest year yet
In this week's Apart News we invite you to revisit Apart Research's incredible 2024 with us.
Read More
Newsletter
Nov 29, 2024
Apart News: how impactful are we?
In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.
Read More
Newsletter
Nov 22, 2024
Apart News: NEW Papers, Elections & Goodfire
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Nov 22, 2024
Testing LLMs' ability to find security flaws in Cryptographic Protocols
Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.
Read More
Community
Nov 18, 2024
How impactful is donating to Apart Research?
Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.
Read More
Newsletter
Nov 15, 2024
Apart News: Announcing Apart Lab Studio
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Nov 11, 2024
Announcing Apart Lab Studio
Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.
Read More
Newsletter
Nov 8, 2024
Apart News: Ale, Cash Prizes & the UK’s AISI
Apart News is our newsletter to keep you up-to-date.
Read More
Spotlights
Nov 5, 2024
Researcher Spotlight: Alexandra Abbas
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Nov 1, 2024
Apart News: Esben, Winning Sprints & ‘3cb’
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 31, 2024
Esben on AGI, 'Sentware', and Confident optimism
Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.
Read More
Research
Oct 30, 2024
‘3cb’: The Catastrophic Cyber Capabilities Benchmark
Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.
Read More
Newsletter
Oct 28, 2024
AI Policy Hackathon in Washington D.C.
Our Hackathon round-up showcases our global 'sprints' community.
Read More
Newsletter
Oct 25, 2024
Apart News: Finn, Cyber Offense & Johns Hopkins
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Oct 18, 2024
Apart News: Clement, Benchmarks & D.C.
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 18, 2024
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
Read More
Spotlights
Oct 18, 2024
Researcher Spotlight: Clement Neo
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Spotlights
Oct 15, 2024
Researcher Spotlight: Akash Kundu
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Newsletter
Oct 11, 2024
Apart News: Researcher Spotlight, New Team Member & Bangalore
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Oct 9, 2024
Esben on agent safety research
Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!
Read More
Newsletter
Oct 4, 2024
Apart News: Agents, Submissions & Spain
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 27, 2024
Apart News: New Research, NeurIPS Papers & Team Offsite
Apart News is our newsletter to keep you up-to-date.
Read More
Research
Sep 27, 2024
Do models really internalize our preferences?
Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?
Read More
Newsletter
Sep 20, 2024
Apart News: o1, Awards & Singapore
Apart News is our newsletter to keep you up-to-date.
Read More
Newsletter
Sep 13, 2024
Apart News: AI Startups, India & Concordia
Apart News is our newsletter to keep you up-to-date.
Read More
Community
Sep 13, 2024
Can startups be impactful in AI safety?
This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.
Read More
Community
Aug 24, 2024
Where we are on for-profit AI safety
Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.
Read More
Community
Jul 23, 2024
Finding Deception in Language Models
This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.
Read More
Community
Jun 20, 2024
Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)
Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.
Read More
Community
May 17, 2024
The ultimate guide to AI safety research hackathons
Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.
Read More
Community
Apr 19, 2024
Join us at the AI x Democracy research hackathon
Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.
Read More
Community
Mar 18, 2024
Join the AI Evaluation Tasks Bounty Hackathon with METR
In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.
Read More
Community
Mar 1, 2024
How to organize a research hackathon
Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.
Read More
Spotlights
Feb 12, 2024
Researcher Spotlight: Jacob Haimes
Our Researcher Spotlight series highlights the global community at the heart of Apart Research.
Read More
Community
Feb 1, 2024
For-profit AI Safety
AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?
Read More
Community
Jan 23, 2024
Taking your next steps after a research hackathon
With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!
Read More
Community
Dec 12, 2023
Why organize a research hackathon?
There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.
Read More
Community
Jul 13, 2023
Updated quickstart guide for mechanistic interpretability
Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.
Read More
Research
Feb 22, 2023
Results from the Scale Oversight hackathon
Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.
Read More
Research
Jan 2, 2023
Results from the AI testing hackathon
See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.
Read More
Research
Nov 21, 2022
Results from the language model hackathon
See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.
Read More
Research
Nov 17, 2022
Results from the interpretability hackathon
Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.
Read More

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events