This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
Accepted at the 
 research sprint on 

Towards High-Quality Model-Written Evaluations

We aimed to improve the method of generating model-written evaluations for LLMs based a method called Evol-Instruct, which uses LLMs to create complex instructions. We retargeted Evol-Instruct to generate high-quality model evaluations instead, focusing particularly on evaluations for situational awareness. We then compared these evaluations with those generated by the model-written evaluations through few-shot generation. Contrary to our expectations, we observed a consistent decrease in evaluation quality, indicating that our method did not enhance the quality of model-generated evaluations as we had hoped.

Jannes Elstner, Jaime Raldua Veuthey
4th place
3rd place
2nd place
1st place
 by peer review