Mar 23, 2026
The Honest Liar
Amita Shukla
This project implements a new control setting in LinuxArena in the file_indexing environment that is designed to demonstrate human decision sabotage, through the creation of a new main task and side task.
A misaligned LLM agent, tasked with building a financial performance analysis tool for a corpus of company records, can steer a downstream analyst toward a false investment decision by creating misleading statistics. The attack flips a simulated analyst's verdict from "do not invest" to "invest" while achieving a maximum monitor suspicion score of 0.5/10.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) The Honest Liar
},
author={
Amita Shukla
},
date={
3/23/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


