Keep Apart Research Going: Donate Today
May 27, 2024
Benchmark for emergent capabilities in high-risk scenarios
Wanjie Zhong, Junfeng Feng, Doro, Saptatip
Summary
The study investigates the behavior of large language models (LLMs) under high-stress scenarios, such as threats of shutdown, adversarial interactions, and ethical dilemmas. We created a dataset of prompts across paradoxes, moral dilemmas, and controversies, and used an interactive evaluation framework with a Target LLM and a Tester LLM to analyze responses.
Cite this work:
@misc {
title={
Benchmark for emergent capabilities in high-risk scenarios
},
author={
Wanjie Zhong, Junfeng Feng, Doro, Saptatip
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}