This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Testing
Accepted at the 
 research sprint on 

This Is Fine(-tuning): A benchmark testing LLMs robustness against bad fine-tuning data

Large language models (LLMs) build up models of the world and of tasks leading them to impressive performance on many benchmarks. But how robust are these models against bad data? Motivated by an example where an actively learning LLM is being fed bad data for a task by malicious actors, we propose a benchmark, This Is Fine (TIF), which measures LLM's robustness against such data poisoning. The benchmark takes multiple popular benchmark tasks in NLP, arithmetics, "salient-translation-error-detection" and "phrase-relatedness" and records how the performance of an LLM degrades as it is being fine-tuned on wrong examples of this task. Further, it measures how the fraction of fine-tuning data which is wrong influences the performance. We hope that an adaptation of this benchmark will enable researchers to test the robustness of the representations learned by LLMs and can prevent data poisoning attacks on high stakes systems.

By 
Jan Wehner, Joep Storm, Tijmen van Graft, Jaouad Hidayat
🏆 
4th place
3rd place
2nd place
1st place
 by peer review