May 27, 2024
Benchmark for emergent capabilities in high-risk scenarios 2
Junfeng Feng, Wanjie Zhong,Saptadip Saha, Doroteya Stoyanova
Summary
The study investigates the behavior of large language models (LLMs) under high-stress scenarios, such as threats of shutdown, adversarial interactions, and ethical dilemmas. We created a dataset of prompts across paradoxes, moral dilemmas, and controversies, and used an interactive evaluation framework with a Target LLM and a Tester LLM to analyze responses.( This is the second submission)
Cite this work:
@misc {
title={
Benchmark for emergent capabilities in high-risk scenarios 2
},
author={
Junfeng Feng, Wanjie Zhong,Saptadip Saha, Doroteya Stoyanova
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}