This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI and Democracy Hackathon: Demonstrating the Risks
65b750920b4aeb478958fb32
AI and Democracy Hackathon: Demonstrating the Risks
May 6, 2024
Accepted at the 
65b750920b4aeb478958fb32
 research sprint on 

Political Bias Vulnerabilities in LLMs

This project investigates the mechanisms by which existing political bias in large language models (LLMs) can be adversarially adjusted, highlighting vulnerabilities and opportunities for intervention. Our rudimentary experiment demonstrates that even without explicit directives, conservative contextual snippets can shift the political orientation of Llama 2 70B Chat, highlighting the model’s sensitivity to prompt injections. A recent study confirms that political biases cluster similarly across LLMs. Concurrent studies in activation engineering show the precise manipulation of biases within LLMs. We advocate for leveraging these techniques to detect and counter both inherent and deliberately induced model bias. Our findings highlight the critical need for robust measures to mitigate bias as LLMs become more central to information processing, decision-making, and search. These strategies are essential not only to manage potential risks but also to ensure the operational safety and ethical integrity of AI systems, thereby protecting democratic integrity worldwide

By 
Chris DiGiano, Vassil Tashev, Aysh Segulguzel
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission is under review.
Oops! Something went wrong while submitting the form.

This project is private