Curated model list

The best AI models for coding in 2026

Coding is the most-measured LLM capability today. Frontier models are stratifying along three dimensions: SWE-bench Verified score, terminal-agent capability, and pricing per million tokens. Here are the ones that actually ship working code.

Claude 4.7 Opus

Anthropic

Top SWE-bench Verified — Anthropic's coding-tuned flagship. Best when correctness on long-running multi-file changes matters.

GPT-5.5

OpenAI

Strong all-rounder for codegen. Pairs well with the Codex CLI agent.

Gemini 3 Pro

Google

Massive context window pays off in monorepo-scale codebases. Best for "read this whole repo and refactor X" prompts.

Qwen3.6-27B

Alibaba

Open-weight dense model matching Claude 4.5 Opus on Terminal-Bench 2.0. Self-hostable.

open source

DeepSeek-V4-Pro

DeepSeek

1.6T-parameter MoE with 1M context. Open-weight, MIT-licensed — best for self-hosted Agent stacks.

open source

Claude 4.6 Sonnet