This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Safety Entrepreneurship Hackathon
67657515db3dae3e3314dae8
AI Safety Entrepreneurship Hackathon
January 20, 2025
Accepted at the 
67657515db3dae3e3314dae8
 research sprint on 

AntiMidas: Building Commercially-Viable Agents for Alignment Dataset Generation

AI alignment lacks high-quality, real-world preference data needed to align agentic superintel- ligent systems. Our technical innovation builds on Pacchiardi et al. (2023)’s breakthrough in detecting AI deception through black-box analysis. We adapt their classification methodology to identify intent misalignment between agent actions and true user intent, enabling real-time correction of agent behavior and generation of valuable alignment data. We commercialise this by building a workflow that incorporates a classifier that runs on live trajectories as a user interacts with an agent in commercial contexts. This creates a virtuous cycle: our alignment expertise produces superior agents, driving commercial adoption, which generates increasingly valuable alignment datasets of trillions of trajectories labelled with human feedback: an invalu- able resource for AI alignment.

By 
Jacob Arbeid, Jay Bailey, Sam Lowenstein, Jake Pencharz
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private