Curated model list
The best AI models for coding in 2026
Coding is the most-measured LLM capability today. Frontier models are stratifying along three dimensions: SWE-bench Verified score, terminal-agent capability, and pricing per million tokens. Here are the ones that actually ship working code.
Claude 4.7 Opus
AnthropicTop SWE-bench Verified — Anthropic's coding-tuned flagship. Best when correctness on long-running multi-file changes matters.
GPT-5.5
OpenAIStrong all-rounder for codegen. Pairs well with the Codex CLI agent.
Gemini 3 Pro
GoogleMassive context window pays off in monorepo-scale codebases. Best for "read this whole repo and refactor X" prompts.
Qwen3.6-27B
AlibabaOpen-weight dense model matching Claude 4.5 Opus on Terminal-Bench 2.0. Self-hostable.
open sourceDeepSeek-V4-Pro
DeepSeek1.6T-parameter MoE with 1M context. Open-weight, MIT-licensed — best for self-hosted Agent stacks.
open sourceClaude 4.6 Sonnet
AnthropicCheaper Anthropic option, similar shape to Opus at lower API cost.
Want the rest? Browse the full model catalog, or build a side-by-side comparison.