This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Security Evaluation Hackathon: Measuring AI Capability
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
May 27, 2024
Accepted at the 
65b750b6007bebd5884ddbbf
 research sprint on 

Detecting Anthropomorphic Tendencies in Language Models via Conversational Probing

How often do models respond to prompts in anthropomorphic ways? AnthroBench-75 will tell you! It's a small benchmark to elicit anthropomorphic behavior from language models with the intention to identify when models are anthropomorphic, which might be a risk in manipulation, AI consciousness, misspecification, and more.

By 
Jacob Haimes, Esben Kran
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private