This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Reprogramming AI Models Hackathon
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
November 25, 2024
Accepted at the 
6710eab8447f62cdea3a653c
 research sprint on 

Improving Llama-3-8B-Instruct Hallucination Robustness in Medical Q&A Using Feature Steering

This paper addresses the risks of hallucinations in LLMs within critical domains like medicine. It proposes methods to (a) reduce hallucination probability in responses, (b) inform users of hallucination risks and model accuracy for specific queries, and (c) display hallucination risk through a user interface. Steered model variants demonstrate reduced hallucinations and improved accuracy on medical queries. The work bridges interpretability research with practical AI safety, offering a scalable solution for the healthcare industry. Future efforts will focus on identifying and removing distractor features in classifier activations to enhance performance.

By 
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private