May 26, 2024
Evaluating the ability of LLMs to follow rules
Jasmina Nasufi, Einar Urdshals
Summary
In this report we study the ability of LLMs (GPT-3.5-Turbo and meta-llama-3-70b-instruct) to follow explicitly stated rules with no moral connotations in a simple single-shot and multiple choice prompt setup. We study the trade off between following the rules and maximizing an arbitrary number of points stated in the prompt. We find that LLMs follow the rules in a clear majority of the cases, while at the same time optimizing to maximize the number of points. Interestingly, in less than 4% of the cases, meta-llama-3-70b-instruct chooses to break the rules to maximize the number of points.
Cite this work:
@misc {
title={
Evaluating the ability of LLMs to follow rules
},
author={
Jasmina Nasufi, Einar Urdshals
},
date={
5/26/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}