Nov 25, 2024
Investigating Feature Effects on Manipulation Susceptibility
Nishchal Prabhakar, Stefan Trnjakov, Mo Aziz
Summary
In our project, we consider the effectiveness of the AI’s prompt injection protection, and in partic-
ular the features that are responsible for providing the bulk of this protection. We prove that the
features we identify are responsible for this protection by creating variants of the base model which
perform significantly worse under prompt injection attacks.
Cite this work:
@misc {
title={
Investigating Feature Effects on Manipulation Susceptibility
},
author={
Nishchal Prabhakar, Stefan Trnjakov, Mo Aziz
},
date={
11/25/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}