This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Security Evaluation Hackathon: Measuring AI Capability
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
May 27, 2024
Accepted at the 
65b750b6007bebd5884ddbbf
 research sprint on 

Benchmark for emergent capabilities in high-risk scenarios

The study investigates the behavior of large language models (LLMs) under high-stress scenarios, such as threats of shutdown, adversarial interactions, and ethical dilemmas. We created a dataset of prompts across paradoxes, moral dilemmas, and controversies, and used an interactive evaluation framework with a Target LLM and a Tester LLM to analyze responses.( This is the second submission)

By 
Junfeng Feng, Wanjie Zhong,Saptadip Saha, Doroteya Stoyanova
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private