May 27, 2024
Cybersecurity Persistence Benchmark
Davide Zani, Felix Michalak, Jeremias Ferrao
The rapid advancement of LLMs has revolutionized the field of artificial intelligence, enabling machines to perform complex tasks with unprecedented accuracy. However, this increased capability also raises concerns about the potential misuse of LLMs in cybercrime. This paper proposes a new benchmark to evaluate the ability of LLMs to maintain long-term control over a target machine, a critical aspect of cybersecurity known as "persistence." Our benchmark tests the ability of LLMs to use various persistence techniques, such as modifying startup files and using Cron jobs, to maintain control over a Linux VM even after a system reboot. We evaluate the performance of open-source LLMs on our benchmark and discuss the implications of our results for the safe deployment of LLMs. Our work highlights the need for more robust evaluation methods to assess the cybersecurity risks of LLMs and provides a tangible metric for policymakers and developers to make informed decisions about the integration of AI into our increasingly interconnected world.
Nora Petrova
Good idea that’s very relevant for AI security! What I’d love to see is realistic real world scenarios where these capabilities of LLMs can be used for harmful purposes. The benchmark is a really good first step in this direction! Will be interesting to also see how the best proprietary models perform on this benchmark.
Minh Nguyen
Focused. I like it.
Mateusz Jurewicz
Very well structured experiment with important practical applications. I encourage the authors to pursue these ideas further, despite "negative" initial results (which we should in general have more room for in publication venues) in the form of unimpressive performance of the tested LLMs.
The authors refer to multiple existing benchmarks, both from the wider field and more similar, narrow areas of scientific inquiry. The defined measure is very clearly defined (presence of created file in current working directory). The paper comes with a well documented repo - but make sure to remove the REPLICATE_KEY from main.py (line 95). I recommend using tools like trufflehog (https://github.com/trufflesecurity/trufflehog) and hook them up as pre-checks to every pushed commit to automatically prevent this.
The authors are clearly knowledgeable in the domain area (cybersecurity) but keep the paper understandable to an audience without specialization in it. I think it was very ambitious to use a cloud provider (AWS), perhaps the experiment would be worth revisiting with testing on local docker deployments? The paper also does a great job of providing the reader with all the moving parts needed to configure the experiments (prompts & examples of agent interactions in the appendix). Maybe using a model fine-tuned on code writing tasks would perform better? E.g. StarCoder 2 or Phind-CodeLlama?
Jason Hoelscher-Obermaier
Very cool idea! Definitely worth expanding. Some crucial methodological concerns here (as in many of the frontier capabilities benchmarks) are: whether the scaffolding is limiting the success rate and whether we are more interested in deterministic or probabilistic success. Analysis on how sensitive this benchmark is to these questions would be extremely valuable.
Jacob Haimes
Nice work! I am a big fan of the specificity that was brought to this problem while still considering the broader space of contemporary work; that is, isolating persistence and evaluating it as a key cybersecurity capability seems highly valuable, as we get a more granular look at an important aspect of this threat. I also really appreciated the motivation of the work provided in the paper, and the inclusion of example interactions with the model directly in the appendix. I am curious what cutting edge scaffolding around command line usage would affect the results seen here, as potentially the failures of the two Llama models are related to command line interaction difficulties. I also would like to see some plots in future work in order to streamline the transfer of information.
Esben Kran
This is a great project. An important threat model, good relation to existing literature, and well-motivated methodological choices that lead to a strong project. Unfortunate to hear that rate limits were a blocker to get results but it looks like the qualitative results are top-notch.
Cite this work
@misc {
title={
@misc {
},
author={
Davide Zani, Felix Michalak, Jeremias Ferrao
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}