Security Benchmarks
1 benchmark in this category
-
CyberGym: Cybersecurity Exploit Generation Benchmark for AI Agents
CyberGym is a cybersecurity benchmark from UC Berkeley where agents generate Proof-of-Concept exploits for real C/C++ vulnerabilities across four difficulty levels.
Benchmark Your MCP Server
Get hard numbers comparing tool-assisted vs. baseline agent performance on real tasks.
Get Started Browse BenchmarksCreated by Grey Newell