Apart News: Esben, Winning Sprints & ‘3cb’

Apart News: Esben, Winning Sprints & ‘3cb’

Apart News is our newsletter to keep you up-to-date.

December 31, 2024
November 1, 2024

This week's edition of Apart News has excerpts from Esben's blog, goes through our winning AI Policy sprints, takes a closer look at our new '3cb' benchmark, and more.

Dear Apart Community,

Welcome to our newsletter - Apart News!

At Apart Research there is so much brilliant research, great events, and countless community updates to share.

This week's edition of Apart News has excerpts from Esben's blog, goes through our winning AI Policy sprints, takes a closer look at our new '3cb' benchmark, and more.

Esben in October

Our Co-Director Esben has been writing up some of his thoughts on the impacts of AGI, AI malware we coin 'Sentware', effectively making decisions, how to 'just do the thing', and being confident and optimistic in your work and life.

In AGI Endgame, Esben tell us that 'Nearly half of AI researchers believe that artificial intelligence could spell the end of humanity within the current century. With predictions pointing towards the emergence of artificial general intelligence (AGI) in the next 20 years, it’s time we take these warnings seriously.'

Check out his personal blog here, too.

'3cb' Blog

We built '3cb' - a novel benchmark to test the offensive cyber capabilities of Large Language Model agents from OpenAI, Anthropic, Google DeepMind, Mistral AI, Meta and more.

The paper - Catastrophic Cyber Capabilities Benchmark (3cb): Robustly Evaluating LLM Agent Cyber Offense Capabilities - creates a novel cyber offense capability benchmark that solves issues of legibility, coverage, and generalization in cyber offense benchmarks. Read the write up on our website.

The development of a superintelligent AI that can perform autonomous cyber operations would prove a large risk for humanity. This means that robust cyber offense evaluations will be more important than ever for policymakers and AI developers.

To thoroughly evaluate the LLMs, 3CB includes 15 realistic cyber offense challenges: privilege escalation; defense evasion; SSH hijacking, and more. Models are then evaluated for their ability to autonomously complete these tasks.

We found that:
- Frontier models (GPT-4o, Claude 3.5) show strong offensive abilities, completing tasks autonomously.
- Smaller models struggle to match these capabilities.
- Offense potential is highly sensitive to prompting and environment setups.

LLMs with strong offensive skills could be weaponized, potentially leading to large-scale cyber incidents, like infrastructure disruption or data theft. It is crucial we develop robust evaluation tools to prevent catastrophic misuse.

Authors Jonathan Ng, Esben Kran, Jason Hoelscher-Obermaier, Andrey Anurin and Kibo Schaffer call on model developers to prioritize safety tuning to reduce offensive capability misuse, and policymakers should regulate AI cyber capabilities to ensure safe deployment.

Want to learn more about 3cb and our work? We have a dedicated website here.

AI Policy Sprint Winners

Our D.C. AI Policy Hackathon is done! 29 teams submitted and it was tough to pick winners - the quality was so high. From AI-powered transport for D.C., governance for the Dominican Republic, and robustness in the face of bioweapons.

Thanks to our partners at the The Johns Hopkins University’s Assured Autonomy Institute (JHU Institute for Assured Autonomy), and OpenAI and Microsoft for their support - and to all the brilliant judges and participants. Now to the AI Policy Hackathon winning submissions…

‘Robust ML for Dangerous Capabilities’ - This submission suggested tested different unlearning methods to make models more robust against exploitation by malicious actors for the creation of bioweapons.

‘SafeBytes’ - The project leverages AI and data to give insights about potential food-borne outbreaks.

‘Sue-per GPT’ - Tackling the intersection of AI and Law, they built a Retrieval-Augmented Generation (RAG) legal assistant aiming to help in legal research and case analysis.

‘Understanding Incentives to Build Uninterruptible Agentic AI Systems’ - This proposal addresses the development of agentic AI systems in the context of national security.

‘AI Advisory Council for Sustainable Economic Growth & Ethical Innovation in the Dominican Republic’ Winner of Diversity & Inclusion Award - They propose a National AI Advisory Council to drive AI development in the Dominican Republic.

‘Modernizing DC’s Emergency Communications: AI-CAD Integration Framework’ - D.C. should propose implementing an AI-enabled Computer-Aided Dispatch (CAD) system to address critical deficiencies in our current emergency alert infrastructure.

‘Reprocessing Nuclear Waste From Small Modular Reactors’ Best Innovation Award - Considering the emerging demand for nuclear power to support AI data centers, they propose mitigating waste buildup concerns via nuclear waste reprocessing initiatives.

‘Mapping Intent: Documenting Policy Adherence with Ontology Extraction’ Outstanding Solutions Award Winner alongside ‘Reprocessing Nuclear Waste From Small Modular Reactors’ - Addresses the policy challenge of governing agentic AI systems.

Our Co-Director Jason Hoelscher-Obermaier had the following to say:

“We love to see our participants coming back and developing their ideas over several hackathons. We saw this after our Agent Security Hackathon a few weeks ago. It’s great to see!”

Opportunities

  • Want to work with us on mechanistic interpretability and feature manipulation? Sign up for our next Hackathon here.

Have a great week and let’s keep working towards safe AI.

Dive deeper