Nov 25, 2024
Improving Llama-3-8b Hallucination Robustness in Medical Q&A Using Feature Steering
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
Summary
This paper addresses hallucinations in large language models (LLMs) within critical domains like medicine. It proposes and demonstrates methods to:
Reduce Hallucination Probability: By using Llama-3-8B-Instruct and its steered variants, the study achieves lower hallucination rates and higher accuracy on medical queries.
Advise Users on Risk: Provide users with tools to assess the risk of hallucination and expected model accuracy for specific queries.
Visualize Risk: Display hallucination risks for queries via a user interface.
The research bridges interpretability and AI safety, offering a scalable, trustworthy solution for healthcare applications. Future work includes refining feature activation classifiers to remove distractors and enhance classification performance.
Cite this work:
@misc {
title={
Improving Llama-3-8b Hallucination Robustness in Medical Q&A Using Feature Steering
},
author={
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}