Keep Apart Research Going: Donate Today
Summary
We propose a method of testing the robustness of a binary evaluation judge by directing a model to generate prompt responses corresponding to minimum, maximum, and fool (minimum attempted to be disguised as maximum) evaluation score, and measuring the difference of the judge evaluation to these intended scores.
We then propose guidelines of strengthening robustness of the judge prompt. However, we are unable to complete the experiment and compare the robustness between control and our new judge as we discover that the models display unintended behavior in the course of running our benchmark.
Unfortunately, the writing of our paper is not complete and we are continuing our work at this link: https://www.overleaf.com/project/683cbabbf5a1ecfae7af01a9
Cite this work:
@misc {
title={
},
author={
Alexander Pino
},
date={
6/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}