Code Understanding Benchmarks
1 benchmark in this category
-
RepoQA: Long-Context Code Understanding & Function Search
RepoQA evaluates long-context code understanding by testing whether agents can find and identify specific functions within large repository codebases.
Benchmark Your MCP Server
Get hard numbers comparing tool-assisted vs. baseline agent performance on real tasks.
Get Started Browse BenchmarksCreated by Grey Newell