ML Research Benchmarks

1 benchmark in this category

MLAgentBench: Real ML Research Tasks from Kaggle Competitions
MLAgentBench evaluates AI agents on real ML research tasks based on Kaggle competitions, testing their ability to train models, improve performance metrics, and debug ML pipelines.

Benchmark Your MCP Server

Get hard numbers comparing tool-assisted vs. baseline agent performance on real tasks.