Apart News: ICLR Awards & Women in AI Safety

Apart News: ICLR Awards & Women in AI Safety

This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

February 21, 2025
February 21, 2025

Dear Apart Community,

Welcome to our newsletter - Apart News!

At Apart Research there is so much brilliant research, great events, and countless community updates to share. This week, we celebrate ICLR conference oral awards for two of our papers, launch our Women in AI Safety hackathon, and more.

ICLR Awards: DarkBench & Turning Up The Heat

​Apart Research​'s ​'DarkBench: Benchmarking Dark Patterns in Large Language Models'​ paper was accepted and given an oral award at this year's ​ICLR​ 2025 conference.

DarkBench​ is a comprehensive benchmark for detecting dark design patterns - manipulative techniques that influence user behavior - in interactions with Large Language Models (LLMs).

It comprises 660 prompts across six categories: brand bias, user retention, sycophancy, anthropomorphism, harmful generation, and sneaking. Models are evaluated from five leading companies (OpenAI, Anthropic, Meta, Mistral, Google).

Apart's research team found that some LLMs are explicitly designed to favor their developers' products and exhibit untruthful communication, among other manipulative behaviors. We believe companies developing LLMs should be doing more to recognize and mitigate the impact of dark design patterns to promote more ethical AI.

Want to know more about which Dark Patterns are pervasive in the world's most popular LLMs? Read our new blog 'Uncovering Model Manipulation with DarkBench'​ to find out.

​'Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs'​ has also been accepted at ICLR 2025 and received an oral award.

The team looks at how 'LLMs generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.'

They find that 'popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures, leading to incoherent or repetitive outputs.'

To address this challenge, the team 'proposes min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability.'

And no, there is absolutely *no* connection between author ​Minh's​ name and the Min-p title... we promise. Read their paper ​here​, and the full research blog writeup will be on our website soon!

Women in AI Safety Hackathon

Apart Research’s inaugural Women in AI Safety Hackathon will launch in a few weeks' time. Come with us and other talented individuals to tackle crucial challenges in AI development and deployment.

We're thrilled to partner with Women Who Do Data (W2D2), a member-led community dedicated to supporting diverse talent in AI technology development, to bring you this unique hackathon during International Women's Day weekend.

​Bluedot Impact​ and ​Goodfire AI​ are track partners for the event, too. Sign up ​here​.

Hardware Challenge Hackathon

This Hardware Challenge hackathon brings together hardware experts, security researchers, and AI safety enthusiasts to develop and test verification systems for detecting AI training activities through side-channel analysis.

We invite people to register at physical Jam Sites especially for the Hardware hackathon, since we need to ship the actual hardware to you. Participants will work with these state-of-the-art hardware setups to create innovative solutions for monitoring compute usage. Sign up ​here​.

Announcement: Lambda Compute Partnership

We are excited to announce our compute partnership with Lambda. Each Apart Lab team has $5,000 of compute to power their work. Hackathon teams will have $400 of compute for their pilot experiments.

Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. We are excited for this partnership and for the research that will be powered by it.

Thoughts from Jason

On LinkedIn this week ​Jason​, our Co-Director, had some ​thoughts​ about the impact of AI on labur markets. Have a read below:

'JD Vance [says] “AI will 𝘴𝘶𝘱𝘱𝘭𝘦𝘮𝘦𝘯𝘵 labor, create 𝘮𝘰𝘳𝘦 jobs, and make individuals 𝘮𝘰𝘳𝘦 important.” […] this is at complete odds with the stated goals and predictions from the leaders within the AI field. They tell us they are building 𝘴𝘶𝘱𝘦𝘳𝘪𝘯𝘵𝘦𝘭𝘭𝘪𝘨𝘦𝘯𝘤𝘦. […] On its current path, this is labor replacing technology for nearly every sector of the economy.'

'Thanks Cameron Tice (a fellow of ours) for pointing out the glaring inconsistency. It’s high time for society to wake up to the magnitude of the challenge. Politicians everywhere need to give this full attention and think it through for themselves: if AGI is the stated goal and AGI means drop-in replacement for human labor at a fraction of the cost, what does that mean for wages and inequality of income and power?'

Definitely food for thought. And communications lead Connor was in Zvi's ​popular​ newsletter twice this week ​talking​ about the AI Action Summit in Paris, too, if that's your bag. On AI and labor markets and on the summit. Connor even wrote a ​blog​ post, too!

*NEW* Multi-Agent Paper

Jason, alongside ​Esben​, also coauthored a new ​report​ from the ​Cooperative AI Foundation​. Huge congratulations to all involved.

Multi-Agent Risks from Advanced AI finds that the 'development and widespread deployment of advanced AI agents will give rise to multi-agent systems of unprecedented complexity. A new report from staff at the Cooperative AI Foundation and a host of leading researchers explores the novel and under-appreciated risks these systems pose.'

***

Stay tuned next week... we have a *huge* announcement we've been working on for months! We cannot wait to show you.

Opportunities

  • Never miss a Hackathon by keeping up to date ​​here​​!
  • If you're a founder starting an AI safety company, consider applying to ​seldonaccelerator.com​.
  • If you’re interested in exploring how ​Successif​ can support your career journey, fill out this ​form​ to get started.
  • AI Safety Asia, together with the Cooperative AI Foundation, will be running a 10-week online course focusing on multi-agent cooperation from Early March - May 2025. The ideal candidate will have an understanding of computer science, ML and game theory, though it is not a prerequisite. Apply ​here​.

Have a great week and let’s keep working towards safe and beneficial AI.

Dive deeper