gpt.buzz Agent Index
AI Agent Leaderboard
Cross-benchmark ranking of every tracked AI agent. Each agent's best score on each benchmark is normalized against the field leader, then averaged using editorial weights below. Built to resist the benchmark gaming exposed in the April 2026 Berkeley/RDI report.
How the gpt.buzz Index is computed
Agents missing scores on some benchmarks are scored on what they have — weights redistribute. Full methodology at /methodology.
| # | Agent | Index | SWE-V | SWE-P | TB-2 | GAIA | WebA |
|---|---|---|---|---|---|---|---|
| 01 | Claude Code Anthropic | 95.4 | 80.9 | 55.4 | 65.4 | 74.6 | 64.5 |
| 02 | Codex OpenAI | 70.0 | 85.0 | 56.8 | 77.3 | — | — |
| 03 | OpenCode OpenCode | 21.9 | 62.0 | — | — | — | — |
| 04 | Cline Cline | 20.5 | 58.0 | — | — | — | — |
| 05 | Replit Agent Replit | 18.4 | 52.0 | — | — | — | — |
| 06 | Hermes Agent Nous Research | 15.0 | — | — | — | 56.0 | — |
| 07 | Manus Monica | 13.7 | — | — | — | 51.0 | — |
| 08 | OpenClaw Erik Steinberger | 12.9 | — | — | — | 48.0 | — |
| 09 | Devin Cognition | 12.4 | 35.0 | — | — | — | — |
| 10 | Operator OpenAI | 10.0 | — | — | — | — | 65.8 |
| 11 | Computer Use Anthropic | 9.8 | — | — | — | — | 64.5 |
Tracked agents without scores yet (6)
These agents are tracked in our catalog but haven't reported scores on any of the 5 tracked benchmarks. Most likely they fall outside the coding / general-assistant / browser categories these benchmarks measure.
Per-benchmark leaderboards
- SWE-bench Verified — Real GitHub-issue bug fixes that a human can verify, drawn from popular open-source Python repos.
- SWE-bench Pro — Harder, contamination-resistant coding benchmark — average score is around 25%.
- Terminal-Bench 2.0 — Long-horizon CLI workflows: shells, package managers, log parsing, debugging — every command an agent has to run.
- GAIA — General AI assistant: web browsing, file parsing, multi-modal reasoning, tool use across 450 unambiguous tasks.
- WebArena — Browser agents on realistic web tasks — e-commerce, social, CMS, code collaboration.
Last index recomputed May 12, 2026 — refreshes every 60 seconds.