Code Understanding Benchmarks

1 benchmark in this category

RepoQA: Long-Context Code Understanding & Function Search
RepoQA evaluates long-context code understanding by testing whether agents can find and identify specific functions within large repository codebases.

Benchmark Your MCP Server

Get hard numbers comparing tool-assisted vs. baseline agent performance on real tasks.