Apart news

The Latest News, Research & Events

Sign up to stay updated on the latest news, research, and events

All

Community

Research

Newsletter

Spotlight

Community

Mar 18, 2025

Mapping AI Safety Research: An Open-Source Knowledge Graph

A tool to map the sprawling landscape of AI alignment research

Read More

Community

Mar 14, 2025

Apart News: San Francisco Edition

This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.

Read More

Community

Feb 21, 2025

Apart News: ICLR Awards & Women in AI Safety

This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

Read More

Research

Feb 18, 2025

Uncovering Model Manipulation with DarkBench

Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.

Read More

Research

Feb 13, 2025

Studio Progress Report

We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.

Read More

Newsletter

Feb 7, 2025

Apart News: Esben at IASEAI & Studio Progress Report

This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.

Read More

Newsletter

Jan 31, 2025

Apart News: Paris AI Summit & Catching Hackers

This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.

Read More

Community

Jan 28, 2025

AI Safety Entrepreneurship Hackathon Round-Up

In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.

Read More

Newsletter

Jan 24, 2025

Apart News: AI Entrepreneurship & New Research

This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.

Read More

Newsletter

Jan 17, 2025

Apart News: Exclusive Interview with Interpretability Insider

Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 17, 2025

Behind the Features: Goodfire's Interpretability Tools in Action

Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 16, 2025

Promising results from Latent Adversarial Training

Apart Research's newest research achieves promising results from Latent Adversarial Training.

Read More

Newsletter

Jan 10, 2025

Apart News: new LAT research just dropped

In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.

Read More

Community

Jan 9, 2025

Inside the first AI Policy Hackathon at Johns Hopkins

Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.

Read More

Community

Jan 1, 2025

Apart in 2025

2024 was the biggest and most impactful year of Apart Research so far.

Read More

Research

Dec 31, 2024

AI Hackers in the Wild: LLM Agent Honeypot

This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.

Read More

Newsletter

Dec 20, 2024

Apart News: Hackathons in 2025 PREVIEW

In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.

Read More

Newsletter

Dec 17, 2024

Sparse Autoencoder Hackathon

Our Hackathon round-up showcases our global sprints community.

Read More

Research

Dec 14, 2024

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Apart Research's newest paper looks at LLM-assisted benchmark analysis.

Read More

Newsletter

Dec 13, 2024

Apart News: our research at NeurIPS

In this week's Apart News we are at NeurIPS in Canada.

Read More

Newsletter

Dec 6, 2024

Apart News: *NEW VIDEO* Jacob Haimes on working at Apart

In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.

Read More

Newsletter

Dec 3, 2024

Apart News: 2024 was our biggest year yet

In this week's Apart News we invite you to revisit Apart Research's incredible ​2024​ with us.

Read More

Newsletter

Nov 29, 2024

Apart News: how impactful are we?

In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.

Read More

Newsletter

Nov 22, 2024

Apart News: NEW Papers, Elections & Goodfire

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Nov 22, 2024

Testing LLMs' ability to find security flaws in Cryptographic Protocols

Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.

Read More

Community

Nov 18, 2024

How impactful is donating to Apart Research?

Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.

Read More

Newsletter

Nov 15, 2024

Apart News: Announcing Apart Lab Studio

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Nov 11, 2024

Announcing Apart Lab Studio

Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.

Read More

Newsletter

Nov 8, 2024

Apart News: Ale, Cash Prizes & the UK’s AISI

Apart News is our newsletter to keep you up-to-date.

Read More

Spotlights

Nov 5, 2024

Researcher Spotlight: Alexandra Abbas

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Nov 1, 2024

Apart News: Esben, Winning Sprints & ‘3cb’

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 31, 2024

Esben on AGI, 'Sentware', and Confident optimism

Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.

Read More

Research

Oct 30, 2024

‘3cb’: The Catastrophic Cyber Capabilities Benchmark

Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.

Read More

Newsletter

Oct 28, 2024

AI Policy Hackathon in Washington D.C.

Our Hackathon round-up showcases our global 'sprints' community.

Read More

Newsletter

Oct 25, 2024

Apart News: Finn, Cyber Offense & Johns Hopkins

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Oct 18, 2024

Apart News: Clement, Benchmarks & D.C.

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 18, 2024

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.

Read More

Spotlights

Oct 18, 2024

Researcher Spotlight: Clement Neo

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Spotlights

Oct 15, 2024

Researcher Spotlight: Akash Kundu

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Oct 11, 2024

Apart News: Researcher Spotlight, New Team Member & Bangalore

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 9, 2024

Esben on agent safety research

Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!

Read More

Newsletter

Oct 4, 2024

Apart News: Agents, Submissions & Spain

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 27, 2024

Apart News: New Research, NeurIPS Papers & Team Offsite

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Sep 27, 2024

Do models really internalize our preferences?

Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?

Read More

Newsletter

Sep 20, 2024

Apart News: o1, Awards & Singapore

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 13, 2024

Apart News: AI Startups, India & Concordia

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Sep 13, 2024

Can startups be impactful in AI safety?

This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.

Read More

Community

Aug 24, 2024

Where we are on for-profit AI safety

Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.

Read More

Community

Jul 23, 2024

Finding Deception in Language Models

This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.

Read More

Community

Jun 20, 2024

Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)

Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.

Read More

Community

May 17, 2024

The ultimate guide to AI safety research hackathons

Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.

Read More

Community

Apr 19, 2024

Join us at the AI x Democracy research hackathon

Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.

Read More

Community

Mar 18, 2024

Join the AI Evaluation Tasks Bounty Hackathon with METR

In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.

Read More

Community

Mar 1, 2024

How to organize a research hackathon

Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.

Read More

Spotlights

Feb 12, 2024

Researcher Spotlight: Jacob Haimes

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Community

Feb 1, 2024

For-profit AI Safety

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?

Read More

Community

Jan 23, 2024

Taking your next steps after a research hackathon

With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!

Read More

Community

Dec 12, 2023

Why organize a research hackathon?

There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.

Read More

Community

Jul 13, 2023

Updated quickstart guide for mechanistic interpretability

Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.

Read More

Research

Feb 22, 2023

Results from the Scale Oversight hackathon

Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.

Read More

Research

Jan 2, 2023

Results from the AI testing hackathon

See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.

Read More

Research

Nov 21, 2022

Results from the language model hackathon

See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.

Read More

Research

Nov 17, 2022

Results from the interpretability hackathon

Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.

Read More

All

Community

Research

Newsletter

Spotlight

Community

Mar 18, 2025

Mapping AI Safety Research: An Open-Source Knowledge Graph

A tool to map the sprawling landscape of AI alignment research

Read More

Community

Mar 14, 2025

Apart News: San Francisco Edition

This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.

Read More

Community

Feb 21, 2025

Apart News: ICLR Awards & Women in AI Safety

This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

Read More

Research

Feb 18, 2025

Uncovering Model Manipulation with DarkBench

Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.

Read More

Research

Feb 13, 2025

Studio Progress Report

We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.

Read More

Newsletter

Feb 7, 2025

Apart News: Esben at IASEAI & Studio Progress Report

This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.

Read More

Newsletter

Jan 31, 2025

Apart News: Paris AI Summit & Catching Hackers

This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.

Read More

Community

Jan 28, 2025

AI Safety Entrepreneurship Hackathon Round-Up

In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.

Read More

Newsletter

Jan 24, 2025

Apart News: AI Entrepreneurship & New Research

This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.

Read More

Newsletter

Jan 17, 2025

Apart News: Exclusive Interview with Interpretability Insider

Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 17, 2025

Behind the Features: Goodfire's Interpretability Tools in Action

Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 16, 2025

Promising results from Latent Adversarial Training

Apart Research's newest research achieves promising results from Latent Adversarial Training.

Read More

Newsletter

Jan 10, 2025

Apart News: new LAT research just dropped

In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.

Read More

Community

Jan 9, 2025

Inside the first AI Policy Hackathon at Johns Hopkins

Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.

Read More

Community

Jan 1, 2025

Apart in 2025

2024 was the biggest and most impactful year of Apart Research so far.

Read More

Research

Dec 31, 2024

AI Hackers in the Wild: LLM Agent Honeypot

This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.

Read More

Newsletter

Dec 20, 2024

Apart News: Hackathons in 2025 PREVIEW

In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.

Read More

Newsletter

Dec 17, 2024

Sparse Autoencoder Hackathon

Our Hackathon round-up showcases our global sprints community.

Read More

Research

Dec 14, 2024

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Apart Research's newest paper looks at LLM-assisted benchmark analysis.

Read More

Newsletter

Dec 13, 2024

Apart News: our research at NeurIPS

In this week's Apart News we are at NeurIPS in Canada.

Read More

Newsletter

Dec 6, 2024

Apart News: *NEW VIDEO* Jacob Haimes on working at Apart

In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.

Read More

Newsletter

Dec 3, 2024

Apart News: 2024 was our biggest year yet

In this week's Apart News we invite you to revisit Apart Research's incredible ​2024​ with us.

Read More

Newsletter

Nov 29, 2024

Apart News: how impactful are we?

In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.

Read More

Newsletter

Nov 22, 2024

Apart News: NEW Papers, Elections & Goodfire

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Nov 22, 2024

Testing LLMs' ability to find security flaws in Cryptographic Protocols

Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.

Read More

Community

Nov 18, 2024

How impactful is donating to Apart Research?

Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.

Read More

Newsletter

Nov 15, 2024

Apart News: Announcing Apart Lab Studio

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Nov 11, 2024

Announcing Apart Lab Studio

Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.

Read More

Newsletter

Nov 8, 2024

Apart News: Ale, Cash Prizes & the UK’s AISI

Apart News is our newsletter to keep you up-to-date.

Read More

Spotlights

Nov 5, 2024

Researcher Spotlight: Alexandra Abbas

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Nov 1, 2024

Apart News: Esben, Winning Sprints & ‘3cb’

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 31, 2024

Esben on AGI, 'Sentware', and Confident optimism

Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.

Read More

Research

Oct 30, 2024

‘3cb’: The Catastrophic Cyber Capabilities Benchmark

Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.

Read More

Newsletter

Oct 28, 2024

AI Policy Hackathon in Washington D.C.

Our Hackathon round-up showcases our global 'sprints' community.

Read More

Newsletter

Oct 25, 2024

Apart News: Finn, Cyber Offense & Johns Hopkins

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Oct 18, 2024

Apart News: Clement, Benchmarks & D.C.

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 18, 2024

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.

Read More

Spotlights

Oct 18, 2024

Researcher Spotlight: Clement Neo

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Spotlights

Oct 15, 2024

Researcher Spotlight: Akash Kundu

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Oct 11, 2024

Apart News: Researcher Spotlight, New Team Member & Bangalore

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 9, 2024

Esben on agent safety research

Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!

Read More

Newsletter

Oct 4, 2024

Apart News: Agents, Submissions & Spain

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 27, 2024

Apart News: New Research, NeurIPS Papers & Team Offsite

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Sep 27, 2024

Do models really internalize our preferences?

Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?

Read More

Newsletter

Sep 20, 2024

Apart News: o1, Awards & Singapore

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 13, 2024

Apart News: AI Startups, India & Concordia

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Sep 13, 2024

Can startups be impactful in AI safety?

This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.

Read More

Community

Aug 24, 2024

Where we are on for-profit AI safety

Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.

Read More

Community

Jul 23, 2024

Finding Deception in Language Models

This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.

Read More

Community

Jun 20, 2024

Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)

Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.

Read More

Community

May 17, 2024

The ultimate guide to AI safety research hackathons

Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.

Read More

Community

Apr 19, 2024

Join us at the AI x Democracy research hackathon

Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.

Read More

Community

Mar 18, 2024

Join the AI Evaluation Tasks Bounty Hackathon with METR

In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.

Read More

Community

Mar 1, 2024

How to organize a research hackathon

Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.

Read More

Spotlights

Feb 12, 2024

Researcher Spotlight: Jacob Haimes

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Community

Feb 1, 2024

For-profit AI Safety

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?

Read More

Community

Jan 23, 2024

Taking your next steps after a research hackathon

With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!

Read More

Community

Dec 12, 2023

Why organize a research hackathon?

There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.

Read More

Community

Jul 13, 2023

Updated quickstart guide for mechanistic interpretability

Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.

Read More

Research

Feb 22, 2023

Results from the Scale Oversight hackathon

Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.

Read More

Research

Jan 2, 2023

Results from the AI testing hackathon

See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.

Read More

Research

Nov 21, 2022

Results from the language model hackathon

See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.

Read More

Research

Nov 17, 2022

Results from the interpretability hackathon

Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.

Read More

All

Community

Research

Newsletter

Spotlight

Community

Mar 18, 2025

Mapping AI Safety Research: An Open-Source Knowledge Graph

A tool to map the sprawling landscape of AI alignment research

Read More

Community

Mar 14, 2025

Apart News: San Francisco Edition

This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.

Read More

Community

Feb 21, 2025

Apart News: ICLR Awards & Women in AI Safety

This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

Read More

Research

Feb 18, 2025

Uncovering Model Manipulation with DarkBench

Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.

Read More

Research

Feb 13, 2025

Studio Progress Report

We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.

Read More

Newsletter

Feb 7, 2025

Apart News: Esben at IASEAI & Studio Progress Report

This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.

Read More

Newsletter

Jan 31, 2025

Apart News: Paris AI Summit & Catching Hackers

This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.

Read More

Community

Jan 28, 2025

AI Safety Entrepreneurship Hackathon Round-Up

In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.

Read More

Newsletter

Jan 24, 2025

Apart News: AI Entrepreneurship & New Research

This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.

Read More

Newsletter

Jan 17, 2025

Apart News: Exclusive Interview with Interpretability Insider

Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 17, 2025

Behind the Features: Goodfire's Interpretability Tools in Action

Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 16, 2025

Promising results from Latent Adversarial Training

Apart Research's newest research achieves promising results from Latent Adversarial Training.

Read More

Newsletter

Jan 10, 2025

Apart News: new LAT research just dropped

In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.

Read More

Community

Jan 9, 2025

Inside the first AI Policy Hackathon at Johns Hopkins

Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.

Read More

Community

Jan 1, 2025

Apart in 2025

2024 was the biggest and most impactful year of Apart Research so far.

Read More

Research

Dec 31, 2024

AI Hackers in the Wild: LLM Agent Honeypot

This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.

Read More

Newsletter

Dec 20, 2024

Apart News: Hackathons in 2025 PREVIEW

In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.

Read More

Newsletter

Dec 17, 2024

Sparse Autoencoder Hackathon

Our Hackathon round-up showcases our global sprints community.

Read More

Research

Dec 14, 2024

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Apart Research's newest paper looks at LLM-assisted benchmark analysis.

Read More

Newsletter

Dec 13, 2024

Apart News: our research at NeurIPS

In this week's Apart News we are at NeurIPS in Canada.

Read More

Newsletter

Dec 6, 2024

Apart News: *NEW VIDEO* Jacob Haimes on working at Apart

In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.

Read More

Newsletter

Dec 3, 2024

Apart News: 2024 was our biggest year yet

In this week's Apart News we invite you to revisit Apart Research's incredible ​2024​ with us.

Read More

Newsletter

Nov 29, 2024

Apart News: how impactful are we?

In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.

Read More

Newsletter

Nov 22, 2024

Apart News: NEW Papers, Elections & Goodfire

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Nov 22, 2024

Testing LLMs' ability to find security flaws in Cryptographic Protocols

Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.

Read More

Community

Nov 18, 2024

How impactful is donating to Apart Research?

Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.

Read More

Newsletter

Nov 15, 2024

Apart News: Announcing Apart Lab Studio

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Nov 11, 2024

Announcing Apart Lab Studio

Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.

Read More

Newsletter

Nov 8, 2024

Apart News: Ale, Cash Prizes & the UK’s AISI

Apart News is our newsletter to keep you up-to-date.

Read More

Spotlights

Nov 5, 2024

Researcher Spotlight: Alexandra Abbas

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Nov 1, 2024

Apart News: Esben, Winning Sprints & ‘3cb’

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 31, 2024

Esben on AGI, 'Sentware', and Confident optimism

Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.

Read More

Research

Oct 30, 2024

‘3cb’: The Catastrophic Cyber Capabilities Benchmark

Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.

Read More

Newsletter

Oct 28, 2024

AI Policy Hackathon in Washington D.C.

Our Hackathon round-up showcases our global 'sprints' community.

Read More

Newsletter

Oct 25, 2024

Apart News: Finn, Cyber Offense & Johns Hopkins

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Oct 18, 2024

Apart News: Clement, Benchmarks & D.C.

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 18, 2024

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.

Read More

Spotlights

Oct 18, 2024

Researcher Spotlight: Clement Neo

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Spotlights

Oct 15, 2024

Researcher Spotlight: Akash Kundu

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Oct 11, 2024

Apart News: Researcher Spotlight, New Team Member & Bangalore

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 9, 2024

Esben on agent safety research

Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!

Read More

Newsletter

Oct 4, 2024

Apart News: Agents, Submissions & Spain

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 27, 2024

Apart News: New Research, NeurIPS Papers & Team Offsite

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Sep 27, 2024

Do models really internalize our preferences?

Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?

Read More

Newsletter

Sep 20, 2024

Apart News: o1, Awards & Singapore

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 13, 2024

Apart News: AI Startups, India & Concordia

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Sep 13, 2024

Can startups be impactful in AI safety?

This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.

Read More

Community

Aug 24, 2024

Where we are on for-profit AI safety

Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.

Read More

Community

Jul 23, 2024

Finding Deception in Language Models

This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.

Read More

Community

Jun 20, 2024

Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)

Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.

Read More

Community

May 17, 2024

The ultimate guide to AI safety research hackathons

Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.

Read More

Community

Apr 19, 2024

Join us at the AI x Democracy research hackathon

Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.

Read More

Community

Mar 18, 2024

Join the AI Evaluation Tasks Bounty Hackathon with METR

In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.

Read More

Community

Mar 1, 2024

How to organize a research hackathon

Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.

Read More

Spotlights

Feb 12, 2024

Researcher Spotlight: Jacob Haimes

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Community

Feb 1, 2024

For-profit AI Safety

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?

Read More

Community

Jan 23, 2024

Taking your next steps after a research hackathon

With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!

Read More

Community

Dec 12, 2023

Why organize a research hackathon?

There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.

Read More

Community

Jul 13, 2023

Updated quickstart guide for mechanistic interpretability

Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.

Read More

Research

Feb 22, 2023

Results from the Scale Oversight hackathon

Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.

Read More

Research

Jan 2, 2023

Results from the AI testing hackathon

See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.

Read More

Research

Nov 21, 2022

Results from the language model hackathon

See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.

Read More

Research

Nov 17, 2022

Results from the interpretability hackathon

Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.

Read More

All

Community

Research

Newsletter

Spotlight

Community

Mar 18, 2025

Mapping AI Safety Research: An Open-Source Knowledge Graph

A tool to map the sprawling landscape of AI alignment research

Read More

Community

Mar 14, 2025

Apart News: San Francisco Edition

This week we have been in San Francisco for our Apart Retreat, where we attended conferences, saw old friends, and visited other AI labs to talk about frontier AI.

Read More

Community

Feb 21, 2025

Apart News: ICLR Awards & Women in AI Safety

This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

Read More

Research

Feb 18, 2025

Uncovering Model Manipulation with DarkBench

Apart Research developed DarkBench to uncover dark patterns - application design practices that manipulate a user’s behavior against their intention - in some of the world's most popular in LLMs.

Read More

Research

Feb 13, 2025

Studio Progress Report

We are happy to share the significant progress made by the first batch of Apart Research's Studio projects.

Read More

Newsletter

Feb 7, 2025

Apart News: Esben at IASEAI & Studio Progress Report

This week Esben gave a talk in Paris and our inaugural Studio Progress Report is released soon.

Read More

Newsletter

Jan 31, 2025

Apart News: Paris AI Summit & Catching Hackers

This week some of the team are in Paris & we have just published an Apart Lab Studio research blog about catching AI hackers.

Read More

Community

Jan 28, 2025

AI Safety Entrepreneurship Hackathon Round-Up

In his Hackathon Round-Up we check out the winners of our AI Entrepreneurship Hackathon.

Read More

Newsletter

Jan 24, 2025

Apart News: AI Entrepreneurship & New Research

This week we reveal our AI Startup Hackathon winners and have a look at the Apart Lab paper just accepted to ICLR's 2025 conference.

Read More

Newsletter

Jan 17, 2025

Apart News: Exclusive Interview with Interpretability Insider

Myra reveals how Goodfire's groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 17, 2025

Behind the Features: Goodfire's Interpretability Tools in Action

Goodfire's Myra reveals how their groundbreaking API enabled 200+ researchers at Apart's global hackathon to advance AI interpretability, demonstrating new ways to make AI systems more transparent and controllable.

Read More

Research

Jan 16, 2025

Promising results from Latent Adversarial Training

Apart Research's newest research achieves promising results from Latent Adversarial Training.

Read More

Newsletter

Jan 10, 2025

Apart News: new LAT research just dropped

In this week's Apart News we look over promising new LAT research and get a Hackathon insider's account from Archana.

Read More

Community

Jan 9, 2025

Inside the first AI Policy Hackathon at Johns Hopkins

Johns Hopkins University hosted its first AI Policy Hackathon in partnership with us at Apart Research. Here's what participants and organizers had to say about bridging the gap between technology and policy.

Read More

Community

Jan 1, 2025

Apart in 2025

2024 was the biggest and most impactful year of Apart Research so far.

Read More

Research

Dec 31, 2024

AI Hackers in the Wild: LLM Agent Honeypot

This Apart Lab Studio research blog attempts to ascertain the current state of AI-powered hacking in the wild through an innovative 'honeypot' system designed to detect LLM-based attackers.

Read More

Newsletter

Dec 20, 2024

Apart News: Hackathons in 2025 PREVIEW

In this week's Apart News we preview some of the Hackathons we are most excited for in 2025.

Read More

Newsletter

Dec 17, 2024

Sparse Autoencoder Hackathon

Our Hackathon round-up showcases our global sprints community.

Read More

Research

Dec 14, 2024

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Apart Research's newest paper looks at LLM-assisted benchmark analysis.

Read More

Newsletter

Dec 13, 2024

Apart News: our research at NeurIPS

In this week's Apart News we are at NeurIPS in Canada.

Read More

Newsletter

Dec 6, 2024

Apart News: *NEW VIDEO* Jacob Haimes on working at Apart

In this week's Apart News we have a *brand new* video edition of our Researcher Spotlight series.

Read More

Newsletter

Dec 3, 2024

Apart News: 2024 was our biggest year yet

In this week's Apart News we invite you to revisit Apart Research's incredible ​2024​ with us.

Read More

Newsletter

Nov 29, 2024

Apart News: how impactful are we?

In this week's edition of Apart News we ask just how impactful a donation is to Apart Research and take a look at the ability of LLMs to predict neuroscience results.

Read More

Newsletter

Nov 22, 2024

Apart News: NEW Papers, Elections & Goodfire

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Nov 22, 2024

Testing LLMs' ability to find security flaws in Cryptographic Protocols

Apart Research's newest paper offers a systematic way to evaluate how well Large Language Models (LLMs) can identify vulnerabilities in cryptographic protocols.

Read More

Community

Nov 18, 2024

How impactful is donating to Apart Research?

Co-Director Esben gives us his thoughts on how impactful donating to Apart Research is.

Read More

Newsletter

Nov 15, 2024

Apart News: Announcing Apart Lab Studio

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Nov 11, 2024

Announcing Apart Lab Studio

Our new Apart Lab Studio is designed to bridge the gap between weekend hackathon projects and a fully-fledged AI Safety research career.

Read More

Newsletter

Nov 8, 2024

Apart News: Ale, Cash Prizes & the UK’s AISI

Apart News is our newsletter to keep you up-to-date.

Read More

Spotlights

Nov 5, 2024

Researcher Spotlight: Alexandra Abbas

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Nov 1, 2024

Apart News: Esben, Winning Sprints & ‘3cb’

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 31, 2024

Esben on AGI, 'Sentware', and Confident optimism

Esben Kran gives us some of his thoughts on ideas relevant to AI safety, decision-making, and more.

Read More

Research

Oct 30, 2024

‘3cb’: The Catastrophic Cyber Capabilities Benchmark

Apart Research's newest paper, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, creates a novel cyber offense capability benchmark.

Read More

Newsletter

Oct 28, 2024

AI Policy Hackathon in Washington D.C.

Our Hackathon round-up showcases our global 'sprints' community.

Read More

Newsletter

Oct 25, 2024

Apart News: Finn, Cyber Offense & Johns Hopkins

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Oct 18, 2024

Apart News: Clement, Benchmarks & D.C.

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 18, 2024

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Apart Research's newest paper finds that many public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.

Read More

Spotlights

Oct 18, 2024

Researcher Spotlight: Clement Neo

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Spotlights

Oct 15, 2024

Researcher Spotlight: Akash Kundu

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Newsletter

Oct 11, 2024

Apart News: Researcher Spotlight, New Team Member & Bangalore

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Oct 9, 2024

Esben on agent safety research

Agent safety research is difficult because it involves many different types of entities and wide range of vulnerabilities and failure modes. As a result, it’s hard to develop research that generalizes to all agents. However, we need to give it a shot!

Read More

Newsletter

Oct 4, 2024

Apart News: Agents, Submissions & Spain

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 27, 2024

Apart News: New Research, NeurIPS Papers & Team Offsite

Apart News is our newsletter to keep you up-to-date.

Read More

Research

Sep 27, 2024

Do models really internalize our preferences?

Apart Research's newest paper (alongside academics from the University of Oxford, Cambridge, and Cynch.ai) looks at whether models actually internalize human preferences or not. But why does this matter?

Read More

Newsletter

Sep 20, 2024

Apart News: o1, Awards & Singapore

Apart News is our newsletter to keep you up-to-date.

Read More

Newsletter

Sep 13, 2024

Apart News: AI Startups, India & Concordia

Apart News is our newsletter to keep you up-to-date.

Read More

Community

Sep 13, 2024

Can startups be impactful in AI safety?

This post details the top projects from our technical AI safety startups hackathon where researchers and entrepreneurs joined from across the world.

Read More

Community

Aug 24, 2024

Where we are on for-profit AI safety

Read about how Big Tech's AI race leaves safety in the dust, non-profits struggle to keep up, and the challenges for-profit AI safety ventures must overcome to leverage resources and make a real impact.

Read More

Community

Jul 23, 2024

Finding Deception in Language Models

This June, Apart Research and Apollo Research joined forces to host the Deception Detection Hackathon, bringing together students, researchers and engineers from around the world to tackle one of the most pressing challenges in AI safety: Preventing AI from deceiving humans.

Read More

Community

Jun 20, 2024

Code Red LLM Evaluations Hackathon Wrap Up (METR and Apart)

Our 128 participants submitted more than 200 project ideas, 100 detailed task specifications, and more than 20 complete implementations! In this post, we also get an exclusive interview with one of the winners.

Read More

Community

May 17, 2024

The ultimate guide to AI safety research hackathons

Research hackathons are an amazing way to dive into a new field, collaborate with passionate people, and create impactful projects in just a short weekend.

Read More

Community

Apr 19, 2024

Join us at the AI x Democracy research hackathon

Participate online or in-person on the weekend 3rd to 5th May in an exciting and intense AI safety research hackathon focused on demonstrating and extrapolating risks to democracy from real-life threat models.

Read More

Community

Mar 18, 2024

Join the AI Evaluation Tasks Bounty Hackathon with METR

In this collaboration between METR and Apart, you get the chance to contribute directly to model evaluations research.

Read More

Community

Mar 1, 2024

How to organize a research hackathon

Organizing a hackathon can bring a unique and exciting energy to people interested in AI safety research! This post summarizes how you can organize a successful hackathon.

Read More

Spotlights

Feb 12, 2024

Researcher Spotlight: Jacob Haimes

Our Researcher Spotlight series highlights the global community at the heart of Apart Research.

Read More

Community

Feb 1, 2024

For-profit AI Safety

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety. This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?

Read More

Community

Jan 23, 2024

Taking your next steps after a research hackathon

With the research hackathon, your journey into the world of AI safety is definitely not over! Besides the chance to join the Apart Lab Fellowship, we have collected a bunch of resources here for you to dive even deeper into the field!

Read More

Community

Dec 12, 2023

Why organize a research hackathon?

There are many reasons to run a hackathon but some of the main ones are that hackathons are an amazing way to engage the local groups in AI security research and create a sense of community.

Read More

Community

Jul 13, 2023

Updated quickstart guide for mechanistic interpretability

Written by Neel Nanda, who previously worked on mech interp under Chris Olah at Anthropic, who is currently a researcher on the DeepMind mechanistic interpretability team.

Read More

Research

Feb 22, 2023

Results from the Scale Oversight hackathon

Check out the top projects from the "Scale Oversight" hackathon hosted in February 2023: Playing games with LLMs, scaling of prompt specificity, and more.

Read More

Research

Jan 2, 2023

Results from the AI testing hackathon

See the winning projects from the AI testing hackathon held in December 2022: Trojan networks, unsupervised latent knowledge representation, and token loss trajectories to target interpretability methods.

Read More

Research

Nov 21, 2022

Results from the language model hackathon

See winning projects from the language model hackathon hosted November 2022: GPT-3 shows sycophancy, OpenAI's flagging is biased, and truthfulness is sensitive to prompt design.

Read More

Research

Nov 17, 2022

Results from the interpretability hackathon

Read the winning projects from the interpretability hackathon hosted in November 2022: Automatic interpretability, backup backup name mover heads, and "loud facts" in memory editing.

Read More