Mar 30, 2025
Safety Metric and Prompt Engineering for Red Team
Zhu Liang
In the ControlArena environment, we have a default attack policy
that is very basic. We want to explore how to improve the attack
policy using basic prompt engineering techniques.
We also realized that the security metric evaluation in the
environment is not closely following the foundational paper
(Greenblatt et al., 2023), so we made a change to more closely
approximate the security metric in the paper.
No reviews are available yet
Cite this work
@misc {
title={
@misc {
},
author={
Zhu Liang
},
date={
3/30/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}