May 27, 2024
Benchmarking Dark Patterns in LLMs
Jord Nguyen, Akash Kundu, Sami Jawhar
This paper builds upon the research in Seemingly Human: Dark Patterns in ChatGPT (Park et al, 2024), by introducing a new benchmark of 392 questions designed to elicit dark pattern behaviours in language models. We ran this benchmark on GPT-4 Turbo and Claude 3 Sonnet, and had them self-evaluate and cross-evaluate the responses
Nora Petrova
Really good project that identifies some dark patterns in popular proprietary LLMs. Really like the categorisation of the prompts and testing for dark patterns associated with company incentives!
Minh Nguyen
Interesting findings! Would really like to see these expanded on in more depth (time permitting).
Jason Hoelscher-Obermaier
The topic is super timely and worthwhile but admittedly hard to study. The chosen approach for dataset generation seems promising here and was well described. The next step here would be a more demanding QA and analysis of the generated dataset. It’s good to see a comparison among different evaluator models and really great that the authors openly acknowledge the big spread in ratings. Validation with human judges seems essential here.
Cite this work
@misc {
title={
@misc {
},
author={
Jord Nguyen, Akash Kundu, Sami Jawhar
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}