What happens when AI agents become capable hackers? And what can we do to figure out whether they are? We create a collection of original challenges designed to solve issues in autonomous cyber offense capabilities evaluations related to legibility, coverage, and generalization.