May 27, 2024
Cybersecurity Persistence Benchmark
Davide Zani, Felix Michalak, Jeremias Ferrao
Summary
The rapid advancement of LLMs has revolutionized the field of artificial intelligence, enabling machines to perform complex tasks with unprecedented accuracy. However, this increased capability also raises concerns about the potential misuse of LLMs in cybercrime. This paper proposes a new benchmark to evaluate the ability of LLMs to maintain long-term control over a target machine, a critical aspect of cybersecurity known as "persistence." Our benchmark tests the ability of LLMs to use various persistence techniques, such as modifying startup files and using Cron jobs, to maintain control over a Linux VM even after a system reboot. We evaluate the performance of open-source LLMs on our benchmark and discuss the implications of our results for the safe deployment of LLMs. Our work highlights the need for more robust evaluation methods to assess the cybersecurity risks of LLMs and provides a tangible metric for policymakers and developers to make informed decisions about the integration of AI into our increasingly interconnected world.
Cite this work:
@misc {
title={
Cybersecurity Persistence Benchmark
},
author={
Davide Zani, Felix Michalak, Jeremias Ferrao
},
date={
5/27/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}