May 6, 2024
Subtle and Simple Ways to Shift Political Bias in LLMs
Chris DiGiano, Vassil Tashev, Aysh Segulguzel
An informed user knows that an LLM sometimes has a political bias in their responses, but there’s an additional threat that this bias can drift over time, making it even harder to rely on LLMs for an objective perspective. Furthermore we speculate that a malicious actor can trigger this shift through various means unbeknownst to the user.
Nina Rimsky
Threat model is sound, would be interested in experiments demonstrating your suggested mitigation efficacy
Jason Hoelscher-Obermaier
The question how in-context information could change the political bias of a model is super interesting and relevant to several potential risk scenarios including indirect prompt injection attacks, data poisoning attacks, or just sycophantic feedback loops in human-ai interactions. To improve this project we would need more control conditions to ensure the observed shifts are significant, extending the study to biases in different directions and to different LLMs to see how much results for one LLM generalize.
Bart Bussmann
Interesting project! Cool demonstration of how one can subtly change political biases in LLMs. If you continue this project, I would lov to see more of how you would expect this to influence democracy and concrete examples of how users might ask the LLM relatively apolitical questions, but how the context can steer the user to a particular political side!
Cite this work
@misc {
title={
@misc {
},
author={
Chris DiGiano, Vassil Tashev, Aysh Segulguzel
},
date={
5/6/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}