This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
AI Security Evaluation Hackathon: Measuring AI Capability
AI Security Evaluation Hackathon: Measuring AI Capability
May 27, 2024
Accepted at the 
 research sprint on 

Benchmarking Dark Patterns in LLMs

This paper builds upon the research in Seemingly Human: Dark Patterns in ChatGPT (Park et al, 2024), by introducing a new benchmark of 392 questions designed to elicit dark pattern behaviours in language models. We ran this benchmark on GPT-4 Turbo and Claude 3 Sonnet, and had them self-evaluate and cross-evaluate the responses

Jord Nguyen, Akash Kundu, Sami Jawhar
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private