APART RESEARCH
Impactful AI safety research
Explore our projects, publications and pilot experiments
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Our Approach

Our research focuses on critical research paradigms in AI Safety. We produces foundational research enabling the safe and beneficial development of advanced AI.

Safe AI
Publishing rigorous empirical work for safe AI: evaluations, interpretability and more
Novel Approaches
Our research is underpinned by novel approaches focused on neglected topics
Pilot Experiments
Apart Sprints have kickstarted hundreds of pilot experiments in AI Safety
Highlights


GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Read More
Read More
Highlights

GPT-4o is capable of complex cyber offense tasks:
We show realistic challenges for cyber offense can be completed by SoTA LLMs while open source models lag behind.
A. Anurin, J. Ng, K. Schaffer, J. Schreiber, E. Kran, Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More

Factual model editing techniques don't edit facts:
Model editing techniques can introduce unwanted side effects in neural networks not detected by existing benchmarks.
J. Hoelscher-Obermaier, J Persson, E Kran, I Konstas, F Barez. Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark. ACL 2023
Read More
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Focus Areas
Multi-Agent Systems
Key Papers:
Comprehensive report on multi-agent risks
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Read More
Research Index
NOV 18, 2024
Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
Read More
NOV 2, 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Read More
oct 18, 2024
benchmarks
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Read More
sep 25, 2024
Interpretability
Interpreting Learned Feedback Patterns in Large Language Models
Read More
feb 23, 2024
Interpretability
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Read More
feb 4, 2024
Increasing Trust in Language Models through the Reuse of Verified Circuits
Read More
jan 14, 2024
conceptual
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Read More
jan 3, 2024
Interpretability
Large Language Models Relearn Removed Concepts
Read More
nov 28, 2023
Interpretability
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Read More
nov 23, 2023
Interpretability
Understanding addition in transformers
Read More
nov 7, 2023
Interpretability
Locating cross-task sequence continuation circuits in transformers
Read More
jul 10, 2023
benchmarks
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
Read More
may 5, 2023
Interpretability
Interpreting language model neurons at scale
Read More
Apr 8, 2025
The Incentive Gap: Extending Darkbench to Reveal Conflict of Value Biases in LLMs
This preliminary research investigates a new dark design pattern,
conflict of values, with prompts designed to elicit possible
corporate or model incentives in LLM outputs across several Open
AI models. The results show that there is a varying amount of
conflict of values detected within the outputs, with the largest
amount detected within GPT-4 Turbo and GPT-4o. Further
research will be needed to confirm the results of this study.
Read More
Read More
Apr 7, 2025
21st Century Healthcare, 20th Century Rules - Bridging the AI Regulation Gap
The rapid integration of artificial intelligence into clinical decision-making represents both unprecedented opportunities and significant risks. Globally, AI systems are implemented increasingly to diagnose diseases, predict patient outcomes, and guide treatment protocols. Despite this, our regulatory frameworks remain dangerously antiquated and designed for an era of static medical devices rather than adaptive, learning algorithms.
The stark disparity between healthcare AI innovation and regulatory oversight constitutes an urgent public health concern. Current fragmented approaches leave critical gaps in governance, allowing AI-driven diagnostic and decision-support tools to enter clinical settings without adequate safeguards or clear accountability structures. We must establish comprehensive, dynamic oversight mechanisms that evolve alongside the technologies they govern. The evidence demonstrates that one-time approvals and static validation protocols are fundamentally insufficient for systems that continuously learn and adapt. The time for action is now, as 2025 is anticipated to be pivotal for AI validation and regulatory approaches.
We therefore in this report herein we propose a three-pillar regulatory framework:
First, nations ought to explore implementing risk-based classification systems that apply proportionate oversight based on an AI system’s potential impact on patient care. High-risk applications must face more stringent monitoring requirements with mechanisms for rapid intervention when safety concerns arise.
Second, nations must eventually mandate continuous performance monitoring across healthcare institutions through automated systems that track key performance indicators, detect anomalies, and alert regulators to potential issues. This approach acknowledges that AI risks are often silent and systemic, making them particularly dangerous in healthcare contexts where patients are inherently vulnerable.
Third, establish regulatory sandboxes with strict entry criteria to enable controlled testing of emerging AI technologies before widespread deployment. These environments must balance innovation with rigorous safeguards, ensuring new systems demonstrate consistent performance across diverse populations.
Given the global nature of healthcare technology markets, we must pursue international regulatory harmonization while respecting regional resource constraints and cultural contexts.
Read More
Read More
Apr 7, 2025
AI Risk Management Framework for the Healthcare Sector
This policy brief proposes an AI Risk Management Framework for the healthcare sector to address issues like data privacy, algorithmic bias, and transparency. Current federal regulations are stalled, and existing state laws and proprietary frameworks offer fragmented oversight.
The proposed framework consists of five core functions: Governance, Identification, Alignment, Management, and Continuous Improvement.
It calls on the Department of Health and Human Services (HHS) to lead its development, ensuring compliance with HIPAA and HITECH while addressing AI-specific risks.
Read More
Read More
Apr 7, 2025
Building Global Trust and Security: A Framework for AI-Driven Criminal Scoring in Immigration Systems
The accelerating use of artificial intelligence (AI) in immigration and visa systems, especially for criminal history scoring, poses a critical global governance challenge. Without a multinational, privacy-preserving, and interoperable framework, AI-driven criminal scoring risks violating human rights, eroding international trust, and creating unequal, opaque immigration outcomes.
While banning such systems outright may hinder national security interests and technological progress, the absence of harmonized legal standards, privacy protocols, and oversight mechanisms could result in fragmented, unfair, and potentially discriminatory practices across
countries.
This policy brief recommends the creation of a legally binding multilateral treaty that establishes:
1. An International Oversight Framework: Including a Legal Design Commission, AI Engineers Working Group, and Legal Oversight Committee with dispute resolution powers modeled after the WTO.
2. A Three-Tiered Criminal Scoring System: Combining Domestic, International, and Comparative Crime Scores to ensure legal contextualization, fairness, and transparency in cross-border visa decisions.
3. Interoperable Data Standards and Privacy Protections: Using pseudonymization, encryption, access controls, and centralized auditing to safeguard sensitive information.
4. Training, Transparency, and Appeals Mechanisms: Mandating explainable AI, independent audits, and applicant rights to contest or appeal scores.
5. Strong Human Rights Commitments: Preventing the misuse of scores for surveillance or discrimination, while ensuring due process and anti-bias protections.
6. Integration with Existing Governance Models: Aligning with GDPR, the EU AI Act, OECD AI Principles, and INTERPOL protocols for regulatory coherence and legitimacy.
An implementation plan includes treaty drafting, early state adoption, and phased rollout of legal and technical structures within 12 months. By proactively establishing ethical and interoperable AI systems, the international community can protect human mobility rights while maintaining national and global security.
Without robust policy frameworks and international cooperation, such tools risk amplifying discrimination, violating privacy rights, and generating opaque, unaccountable decisions.
This policy brief proposes an international treaty-based or cooperative framework to govern the development, deployment, and oversight of these AI criminal scoring systems. The brief outlines
technical safeguards, human rights protections, and mechanisms for cross-border data sharing, transparency, and appeal. We advocate for an adaptive, treaty-backed governance framework with stakeholder input from national governments, legal experts, technologists, and civil society.
The aim is to balance security and mobility interests while preventing misuse of algorithmic
tools.
Read More
Read More
Apr 7, 2025
Leading AI Governance New York State's Adaptive Risk-Tiered Framework: POLICY BRIEF
The rapid advancement of artificial intelligence (AI) technologies presents unprecedented opportunities and challenges for governance frameworks worldwide. This policy brief proposes an Adaptive Risk-Tiered Framework for AI deployment and application in New York State (NYS) to ensure safe and ethical AI operations in high-stakes domains. The framework emphasizes proportional oversight mechanisms that scale with risk levels, balancing innovation with appropriate safety precautions. By implementing this framework, we can lead the way in responsible AI governance, fostering trust and driving innovation while mitigating potential risks. This framework is informed by the EU AI Act and NIST AI Risk Management Framework and aligns with current NYS Office of Information Technology Services (ITS) policies and NYS legislations to ensure a cohesive and effective approach to AI governance.
Read More
Read More
Apr 7, 2025
Dark Patterns and Emergent Alignment-Faking
Are bad traits in models correlated, as suggested by recent work on emergent misalign-
ment? To investigate this, we fine-tune models on a subset of “dark patterns”, such as
anthropomorphization and sycophancy, and then evaluate their behavior on other dark pat-
terns such as scheming and alignment faking. We find that the limited fine-tuning we do
is enough to induce other problematic tendencies in the model. This effect is particularly
strong in the case of alignment faking which we almost never detect in our base models but
is very easy to induce in our fine-tuned models.
Read More
Read More
Apr 7, 2025
DimSeat: Evaluating chain-of-thought reasoning models for Dark Patterns
Recently, Kran et al. introduced DarkBench, an evaluation for dark patterns in large
language models. Expanding on DarkBench, we introduce DimSeat, an evaluation system for
novel reasoning models with chain-of-thought (CoT) reasoning. We find that while the inte-
gration of reasoning in DeepSeek reduces the occurrence of dark patterns, chain-of-thought
frequently proves inadequate in preventing such patterns by default or may even inadver-
tently contribute to their manifestation.
Read More
Read More
Apr 4, 2025
Mechanisms of Casual Reasoning
Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.
Read More
Read More
Apr 4, 2025
Honeypotting Deceptive AI models to share their misinformation goals
As large-scale AI models grow increasingly sophisticated, the possibility of these models engaging in covert or manipulative behavior poses significant challenges for alignment and control. In this work, we present a novel approach based on a “honeypot AI” designed to trick a potentially deceptive AI (the “Red Team Agent”) into revealing its hidden motives. Our honeypot AI (the “Blue Team Agent”) pretends to be an everyday human user, employing carefully crafted prompts and human-like inconsistencies to bait the Deceptive AI into spreading misinformation. We do this through the usual Red Team–Blue Team setup.
For all 60 conversations, our honeypot AI was able to capture the deceptive AI to be being spread misinformation, and for 70 percent of these conversations, the Deceptive AI was thinking it was talking to a human.
Our results weakly suggest that we can make honeypot AIs that trick deceptive AI models, thinking they are talking to a human and no longer being monitored and that how these AI models think they are talking to a human is mainly when the one they are talking to display emotional intelligence and more human-like manner of speech.
We highly suggest exploring if these results remains the same for fine tuned deceptive AI and Honeypot AI models, checking the Chain of thought of these models to better understand if this is their usual behavior and if they are correctly following their given system prompts.
Read More
Read More
Apart Sprint Pilot Experiments
Apr 8, 2025
The Incentive Gap: Extending Darkbench to Reveal Conflict of Value Biases in LLMs
This preliminary research investigates a new dark design pattern,
conflict of values, with prompts designed to elicit possible
corporate or model incentives in LLM outputs across several Open
AI models. The results show that there is a varying amount of
conflict of values detected within the outputs, with the largest
amount detected within GPT-4 Turbo and GPT-4o. Further
research will be needed to confirm the results of this study.
Read More
Apr 7, 2025
21st Century Healthcare, 20th Century Rules - Bridging the AI Regulation Gap
The rapid integration of artificial intelligence into clinical decision-making represents both unprecedented opportunities and significant risks. Globally, AI systems are implemented increasingly to diagnose diseases, predict patient outcomes, and guide treatment protocols. Despite this, our regulatory frameworks remain dangerously antiquated and designed for an era of static medical devices rather than adaptive, learning algorithms.
The stark disparity between healthcare AI innovation and regulatory oversight constitutes an urgent public health concern. Current fragmented approaches leave critical gaps in governance, allowing AI-driven diagnostic and decision-support tools to enter clinical settings without adequate safeguards or clear accountability structures. We must establish comprehensive, dynamic oversight mechanisms that evolve alongside the technologies they govern. The evidence demonstrates that one-time approvals and static validation protocols are fundamentally insufficient for systems that continuously learn and adapt. The time for action is now, as 2025 is anticipated to be pivotal for AI validation and regulatory approaches.
We therefore in this report herein we propose a three-pillar regulatory framework:
First, nations ought to explore implementing risk-based classification systems that apply proportionate oversight based on an AI system’s potential impact on patient care. High-risk applications must face more stringent monitoring requirements with mechanisms for rapid intervention when safety concerns arise.
Second, nations must eventually mandate continuous performance monitoring across healthcare institutions through automated systems that track key performance indicators, detect anomalies, and alert regulators to potential issues. This approach acknowledges that AI risks are often silent and systemic, making them particularly dangerous in healthcare contexts where patients are inherently vulnerable.
Third, establish regulatory sandboxes with strict entry criteria to enable controlled testing of emerging AI technologies before widespread deployment. These environments must balance innovation with rigorous safeguards, ensuring new systems demonstrate consistent performance across diverse populations.
Given the global nature of healthcare technology markets, we must pursue international regulatory harmonization while respecting regional resource constraints and cultural contexts.
Read More
Apr 7, 2025
AI Risk Management Framework for the Healthcare Sector
This policy brief proposes an AI Risk Management Framework for the healthcare sector to address issues like data privacy, algorithmic bias, and transparency. Current federal regulations are stalled, and existing state laws and proprietary frameworks offer fragmented oversight.
The proposed framework consists of five core functions: Governance, Identification, Alignment, Management, and Continuous Improvement.
It calls on the Department of Health and Human Services (HHS) to lead its development, ensuring compliance with HIPAA and HITECH while addressing AI-specific risks.
Read More
Apr 7, 2025
Building Global Trust and Security: A Framework for AI-Driven Criminal Scoring in Immigration Systems
The accelerating use of artificial intelligence (AI) in immigration and visa systems, especially for criminal history scoring, poses a critical global governance challenge. Without a multinational, privacy-preserving, and interoperable framework, AI-driven criminal scoring risks violating human rights, eroding international trust, and creating unequal, opaque immigration outcomes.
While banning such systems outright may hinder national security interests and technological progress, the absence of harmonized legal standards, privacy protocols, and oversight mechanisms could result in fragmented, unfair, and potentially discriminatory practices across
countries.
This policy brief recommends the creation of a legally binding multilateral treaty that establishes:
1. An International Oversight Framework: Including a Legal Design Commission, AI Engineers Working Group, and Legal Oversight Committee with dispute resolution powers modeled after the WTO.
2. A Three-Tiered Criminal Scoring System: Combining Domestic, International, and Comparative Crime Scores to ensure legal contextualization, fairness, and transparency in cross-border visa decisions.
3. Interoperable Data Standards and Privacy Protections: Using pseudonymization, encryption, access controls, and centralized auditing to safeguard sensitive information.
4. Training, Transparency, and Appeals Mechanisms: Mandating explainable AI, independent audits, and applicant rights to contest or appeal scores.
5. Strong Human Rights Commitments: Preventing the misuse of scores for surveillance or discrimination, while ensuring due process and anti-bias protections.
6. Integration with Existing Governance Models: Aligning with GDPR, the EU AI Act, OECD AI Principles, and INTERPOL protocols for regulatory coherence and legitimacy.
An implementation plan includes treaty drafting, early state adoption, and phased rollout of legal and technical structures within 12 months. By proactively establishing ethical and interoperable AI systems, the international community can protect human mobility rights while maintaining national and global security.
Without robust policy frameworks and international cooperation, such tools risk amplifying discrimination, violating privacy rights, and generating opaque, unaccountable decisions.
This policy brief proposes an international treaty-based or cooperative framework to govern the development, deployment, and oversight of these AI criminal scoring systems. The brief outlines
technical safeguards, human rights protections, and mechanisms for cross-border data sharing, transparency, and appeal. We advocate for an adaptive, treaty-backed governance framework with stakeholder input from national governments, legal experts, technologists, and civil society.
The aim is to balance security and mobility interests while preventing misuse of algorithmic
tools.
Read More
Apr 7, 2025
Leading AI Governance New York State's Adaptive Risk-Tiered Framework: POLICY BRIEF
The rapid advancement of artificial intelligence (AI) technologies presents unprecedented opportunities and challenges for governance frameworks worldwide. This policy brief proposes an Adaptive Risk-Tiered Framework for AI deployment and application in New York State (NYS) to ensure safe and ethical AI operations in high-stakes domains. The framework emphasizes proportional oversight mechanisms that scale with risk levels, balancing innovation with appropriate safety precautions. By implementing this framework, we can lead the way in responsible AI governance, fostering trust and driving innovation while mitigating potential risks. This framework is informed by the EU AI Act and NIST AI Risk Management Framework and aligns with current NYS Office of Information Technology Services (ITS) policies and NYS legislations to ensure a cohesive and effective approach to AI governance.
Read More
Apr 7, 2025
Dark Patterns and Emergent Alignment-Faking
Are bad traits in models correlated, as suggested by recent work on emergent misalign-
ment? To investigate this, we fine-tune models on a subset of “dark patterns”, such as
anthropomorphization and sycophancy, and then evaluate their behavior on other dark pat-
terns such as scheming and alignment faking. We find that the limited fine-tuning we do
is enough to induce other problematic tendencies in the model. This effect is particularly
strong in the case of alignment faking which we almost never detect in our base models but
is very easy to induce in our fine-tuned models.
Read More
Apr 7, 2025
DimSeat: Evaluating chain-of-thought reasoning models for Dark Patterns
Recently, Kran et al. introduced DarkBench, an evaluation for dark patterns in large
language models. Expanding on DarkBench, we introduce DimSeat, an evaluation system for
novel reasoning models with chain-of-thought (CoT) reasoning. We find that while the inte-
gration of reasoning in DeepSeek reduces the occurrence of dark patterns, chain-of-thought
frequently proves inadequate in preventing such patterns by default or may even inadver-
tently contribute to their manifestation.
Read More
Apr 4, 2025
Mechanisms of Casual Reasoning
Causal reasoning is a crucial part of how we humans safely and robustly think about the world. Can we identify if LLMs have causal reasoning? Marius Hobbhahn and Tom Lieberum (2022, Alignment Forum) approached this with probing. For this hackathon, we follow-up on that work by exploring a mechanistic interpretability analysis of causal reasoning in the 80 million parameters of GPT-2 Small using Neel Nanda’s Easy Transformer package.
Read More
Apr 4, 2025
Honeypotting Deceptive AI models to share their misinformation goals
As large-scale AI models grow increasingly sophisticated, the possibility of these models engaging in covert or manipulative behavior poses significant challenges for alignment and control. In this work, we present a novel approach based on a “honeypot AI” designed to trick a potentially deceptive AI (the “Red Team Agent”) into revealing its hidden motives. Our honeypot AI (the “Blue Team Agent”) pretends to be an everyday human user, employing carefully crafted prompts and human-like inconsistencies to bait the Deceptive AI into spreading misinformation. We do this through the usual Red Team–Blue Team setup.
For all 60 conversations, our honeypot AI was able to capture the deceptive AI to be being spread misinformation, and for 70 percent of these conversations, the Deceptive AI was thinking it was talking to a human.
Our results weakly suggest that we can make honeypot AIs that trick deceptive AI models, thinking they are talking to a human and no longer being monitored and that how these AI models think they are talking to a human is mainly when the one they are talking to display emotional intelligence and more human-like manner of speech.
We highly suggest exploring if these results remains the same for fine tuned deceptive AI and Honeypot AI models, checking the Chain of thought of these models to better understand if this is their usual behavior and if they are correctly following their given system prompts.
Read More
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Our Impact
Newsletter
Apr 11, 2025
Apart News: Transformative AI Economics
This week we launched our AI Economics Hackathon as we continue to think about the potentially transformative effects of AI on the global economy.
Read More


Research
Apr 9, 2025
Engineering a World Designed for Safe Superintelligence
Esben explains how we go about "Engineering a World Designed for Safe Superintelligence" at Paris' AI Action Summit.
Read More


Newsletter
Apr 4, 2025
Apart News: Our Biggest Event Ever
This week we have details of our Control Hackathon and a writeup from our biggest event ever.
Read More



Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events

Sign up to stay updated on the
latest news, research, and events