Catastrophic Cyber Capabilities (3CB)

What happens when AI agents become capable hackers? And what can we do to figure out whether they are? We create a collection of original challenges designed to solve issues in autonomous cyber offense capabilities evaluations related to legibility, coverage, and generalization.

An interview with

"
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
" was written by

Author contribution

No items found.

Citation

Send feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Media kit

Quotes

No items found.

All figures

No items found.