This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Security Evaluation Hackathon: Measuring AI Capability
65b750b6007bebd5884ddbbf
AI Security Evaluation Hackathon: Measuring AI Capability
May 27, 2024
Accepted at the 
65b750b6007bebd5884ddbbf
 research sprint on 

LLM Benchmarking with Single-Agent Stochastic Dynamic Simulations

A benchmark for evaluating the performance of SOTA LLMs in dynamic real-world scenarios.

By 
Sai Joseph, Anita Beroza, Eleni Angelou, Sofia Mendez, Evelyn Ciara
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private