DeepSeek-V4-Pro
DeepSeek's flagship open-weight MoE. 1.6T parameters with 49B activated, 1M-token context, and a hybrid attention scheme (CSA + HCA) that delivers long-context inference at ~27% of V3.2's FLOPs.
Specifications
- Context window
- 1,000,000 tokens
- Parameters
- 1.6T (49B active)
- Modality
- text
- License
- MIT
- Family
- DeepSeek
- Release date
- 2026-04-22
Links
Provider status
Timeline
Released
Initial public availability.
Pricing changes, lineage updates, and new benchmark results appear here as they happen. See the releases feed for the latest vendor activity.
API pricing
No API pricing recorded yet for DeepSeek-V4-Pro.
Looking for consumer subscriptions? See DeepSeek's plans →
Benchmarks
Model Index →| Benchmark | Score | Setting | Measured | Source |
|---|---|---|---|---|
| MMLU-Pro general knowledge | 84.2% | 5-shot CoT | 2026-04-22 | source ↗ |
| GPQA Diamond reasoning | 82.4% | 0-shot CoT | 2026-04-22 | source ↗ |
| HumanEval coding | 95.1% | pass@1 | 2026-04-22 | source ↗ |
| Aider Polyglot coding | 80.1% | edit + test | 2026-04-25 | source ↗ |
| AIME 2025 math | 88.6% | CoT extended | 2026-04-22 | source ↗ |
| LiveCodeBench coding | 72.4% | pass@1, post-cutoff | 2026-04-30 | source ↗ |
Want to see how DeepSeek-V4-Pro ranks across these? Open the Model Index leaderboard →
Infrastructure context
All intelligence →Compute, silicon, and capex events that shape DeepSeek-V4-Pro's economics.
- Compute clusterdirect
DeepSeek operates ~50k H800 GPU fleet despite export controls
DeepSeek's training fleet — believed to be ~50,000 H800 GPUs at peak — was assembled prior to the October 2023 US export controls extension. DeepSeek-V4-Pro's training run reportedly used <8M GPU-hours total, less than 10% of GPT-5's estimated budget, leveraging the hybrid CSA+HCA attention scheme to compress FLOPs.
Accelerators: 50k · H800Location: Hangzhou, China
Compare DeepSeek-V4-Pro with…
Related news
No tagged articles yet. The aggregator surfaces mentions every 15 minutes.