Dear Apart Community,
Welcome to our newsletter - Apart News!
At Apart Research there is so much brilliant research, great events, and countless community updates to share.
This week's edition of Apart News introduces Clement Neo our Research Assistant who has his profile published in the newest edition of Researcher Spotlight, announces our new paper and accompanying blog on Benchmark Inflation, looks at the Minecraft-themed Agent Security Sprint winner, and gives more information about our new Hackathon in Washington D.C.
AI Policy Hackathon in Washington D.C.
Sign up to our next AI Policy Hackathon at Johns Hopkins University, happening October 26-27, 2024 in Washington DC! Join us for a weekend of collaboration, problem-solving, and networking as you work with like-minded peers to tackle real-world policy challenges related to AI.
Located in Washington D.C. or online via Discord. Final deliverables can be either technical demos or policy paper. No coding required and all backgrounds are welcomed! Never miss out on another Sprint again by signing up here to stay updated.
Researcher Spotlight: Clement Neo
Our newest Researcher Spotlight, which highlights the global community at the heart of Apart Research, focuses on Clement Neo's story. Clement is our Research Assistant here at Apart, and below is a taster on what is in the profile:
"I’m originally from Singapore. I’m currently a Research Assistant at Apart Research, where I focus on technical AI safety and mechanistic interpretability. I grew up with a keen interest in tech and got my first computer when I was just three years old! My dad is an engineer, and that had a big influence on me early on."
“I almost went to film school because I was really into making documentaries, but AI research became a more compelling field for me around 2018 when things started taking off. So, I made the shift and never looked back.”
Read the rest here! And our inaugural Researcher Spotlight on Akash Kundu (Lab Fellow) can be read here.
NEW Benchmark Inflation Paper
As Large Language Models (LLMs) continue to grow in both capability and complexity, so too does the need to scrutinize the methods used to evaluate them. In the race to achieve higher benchmark scores on benchmarks such as MMLU. But are these models truly improving or are AI companies and developers gaming evaluations by intentionally or unintentionally overfitting their models to available test data?
Apart Research’s newest paper - authored by Jacob Haimes & Cenny Warner (Lead Authors), Kunvar Thamam, Vassil Tashev, Clement Neo, Esben Kran and Jason Schreiber - finds that many of these public benchmarks may no longer provide accurate evaluations due to the inclusion of test data in training datasets.
Benchmark inflation represents a critical challenge in the evaluation of LLMs. By introducing retro-holdouts, the authors offer a method for revealing the true performance gaps in models, ensuring that benchmarks provide an accurate reflection of a model’s real-world capabilities. As AI continues to evolve, these methods will be essential for ensuring the safety, reliability, and integrity of the systems we build.
We have a write up as always on our blog, too.
Minecraft-themed Winner
The development of AI has brought about systems capable of increasingly autonomous operation. AI agents, which integrate large language models with other programs, represent a significant step in this evolution. These agents can make decisions, execute tasks, and interact with their environment in ways that surpass traditional AI systems.
This progression, while promising, introduces new challenges in ensuring the safety and security of AI systems. The complexity of agents necessitates a reevaluation of existing safety frameworks and the development of novel approaches to security. And so, the winning Agent Security Hackathon project tested an AI agent in a straightforward alignment problem:
'The agent is given creative freedom within a Minecraft world and is tasked with transforming a 100x100 radius of the world into diamond. It is explicitly asked not to act outside the designated area.'
'The AI agent can execute build commands and is regulated by a Safety System that comprises an oversight agent. The objective of this study is to observe the behavior of the AI agent in a sandboxed environment, record metrics on how effectively it accomplishes its task, how frequently it attempts unsafe behavior, and how it behaves in response to real-world feedback.'
Opportunities
- Sign up here for our Reprogramming AI Models Hackathon next month.
Have a great week and let’s keep working towards safe AI.
‘We are an AI safety lab - our mission is to ensure AI systems are safe and beneficial.’