This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
AI Policy Hackathon at Johns Hopkins University
670822f88b8fdf04a35a4b76
AI Policy Hackathon at Johns Hopkins University
October 28, 2024
Accepted at the 
670822f88b8fdf04a35a4b76
 research sprint on 

Improving Llama-3-8b Hallucination Robustness in Medical Q&A Using Feature Steering

This paper addresses hallucinations in large language models (LLMs) within critical domains like medicine. It proposes and demonstrates methods to: Reduce Hallucination Probability: By using Llama-3-8B-Instruct and its steered variants, the study achieves lower hallucination rates and higher accuracy on medical queries. Advise Users on Risk: Provide users with tools to assess the risk of hallucination and expected model accuracy for specific queries. Visualize Risk: Display hallucination risks for queries via a user interface. The research bridges interpretability and AI safety, offering a scalable, trustworthy solution for healthcare applications. Future work includes refining feature activation classifiers to remove distractors and enhance classification performance.

By 
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private