May 27, 2024
WashBench – A Benchmark for Assessing Softening of Harmful Content in LLM-generated Text Summaries
Sev Geraskin, Jakub Kryś, Luhan Mikaelson, Simon Wisdom
Summary
In this work, we explore the tradeoff between toxicity removal and information retention in LLM-generated summaries. We hypothesize that LLMs are less likely to preserve toxic content when summarizing toxic text due to their safety fine-tuning to avoid generating toxic content. In high-stakes decision-making scenarios, where summary quality is important, this may create significant safety risks. To quantify this effect, we introduce WashBench, a benchmark containing manually annotated toxic content.
Cite this work:
@misc {
title={
WashBench – A Benchmark for Assessing Softening of Harmful Content in LLM-generated Text Summaries
},
author={
Sev Geraskin, Jakub Kryś, Luhan Mikaelson, Simon Wisdom
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}