Jan 19, 2025
AntiMidas: Building Commercially-Viable Agents for Alignment Dataset Generation
Jacob Arbeid, Jay Bailey, Sam Lowenstein, Jake Pencharz
🏆 1st place by peer review
AI alignment lacks high-quality, real-world preference data needed to align agentic superintel-
ligent systems. Our technical innovation builds on Pacchiardi et al. (2023)’s breakthrough in
detecting AI deception through black-box analysis. We adapt their classification methodology
to identify intent misalignment between agent actions and true user intent, enabling real-time
correction of agent behavior and generation of valuable alignment data. We commercialise this
by building a workflow that incorporates a classifier that runs on live trajectories as a user
interacts with an agent in commercial contexts. This creates a virtuous cycle: our alignment
expertise produces superior agents, driving commercial adoption, which generates increasingly
valuable alignment datasets of trillions of trajectories labelled with human feedback: an invalu-
able resource for AI alignment.
Pablo Sanzo
I valued the team's vision to generate an alignment dataset, starting with low-stakes cases. I would have appreciated more though on what are the incremental steps (i.e. specific examples of industries and verticals) towards more high-stakes cases.
Also when it comes to the MVP and product, I encourage the team to think further into how this can be commercialized, and what will make e-commerce companies adopt this technology (in the example of the first low-stakes case).
The demo video was very useful to understand the team's proposal.
Edward Yee
Really interesting idea. Biggest challenge is being able to get the system working well enough to collect enough data for it to materially improve an existing agent. Existing companies creating such agents might also push back strongly on this.
Ricardo Raphael Corona Moreno
The team has identified a critical gap in AI alignment - the lack of high-quality, real-world preference data needed for aligning advanced AI systems. Their approach of leveraging commercial agent deployments to systematically collect alignment-relevant data at scale while creating immediate business value is innovative. However, I have concerns about how they'll ensure the quality and representativeness of the collected data across different demographics and use cases. From my MLT Accelerator experience, I know how challenging it is to build robust datasets that avoid perpetuating existing biases. (1) Their technical approach building on Pacchiardi et al.'s work on detecting AI deception shows strong research foundations. The commercial viability angle also provides a clear path to scaling data collection.
(2) The focus on generating high-quality alignment data addresses a fundamental challenge in AI safety. Their threat model around insufficient data quality for training aligned AI systems is well-reasoned.
(3) While their technical approach is sound, more detail is needed on data quality assurance and bias mitigation methods. Their pilot experiment demonstrates feasibility but needs more rigorous validation
Jaime Raldua
I really like the ambitious scope of this project.. not just thinking about current AI systems but actually trying to build something that could potentially scale to ASI, which is pretty impressive. The focus on dataset collection for alignment research is a solid approach, and I particularly appreciate how they're trying to create a practical, commercially viable solution while contributing to the broader alignment research community.
On the research quality front (Criteria 1), while their approach is interesting, I have some concerns about scalability. Their core assumption that methods working for low-risk agents will generalize to high-stakes settings isn't necessarily solid - we've seen from Redwood's research that AI control strategies often need to be fundamentally different for high-stakes versus low-stakes scenarios. Also while I agree with their critique of RLHF not scaling to superintelligent agents, this is debatable.
Regarding AI safety and methods (Criteria 2 & 3), they've done a good job identifying the threat model - the misalignment between agent actions and true user intent. Their adaptation of Pacchiardi's work on deception detection is clever, and I like how they're approaching it through a commercial lens to gather real-world data.
Cite this work
@misc {
title={
@misc {
},
author={
Jacob Arbeid, Jay Bailey, Sam Lowenstein, Jake Pencharz
},
date={
1/19/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}