This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Reprogramming AI Models Hackathon
6710eab8447f62cdea3a653c
Reprogramming AI Models Hackathon
November 25, 2024
Accepted at the 
6710eab8447f62cdea3a653c
 research sprint on 

Investigating Feature Effects on Manipulation Susceptibility

In our project, we consider the effectiveness of the AI’s prompt injection protection, and in partic- ular the features that are responsible for providing the bulk of this protection. We prove that the features we identify are responsible for this protection by creating variants of the base model which perform significantly worse under prompt injection attacks.

By 
Nishchal Prabhakar, Stefan Trnjakov, Mo Aziz
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private