Models/Vendor
DeepSeek
Infrastructure intelligence
Full feed →Compute deals, data centers, silicon, and capex that shape DeepSeek's training and inference economics.
- Compute clusterreported
DeepSeek operates ~50k H800 GPU fleet despite export controls
DeepSeek's training fleet — believed to be ~50,000 H800 GPUs at peak — was assembled prior to the October 2023 US export controls extension. DeepSeek-V4-Pro's training run reportedly used <8M GPU-hours total, less than 10% of GPT-5's estimated budget, leveraging the hybrid CSA+HCA attention scheme to compress FLOPs.
Accelerators: 50k · H800Location: Hangzhou, China
Models
Filter on /models →DeepSeek-V4-Flash
released 2026-04-22
Smaller, faster sibling to DeepSeek-V4-Pro. Same 1M context window with a much lighter 284B / 13B-active MoE.
- Context
- 1,000,000
- Params
- 284B (13B active)
- License
- MIT
- Source
- open
DeepSeek-V4-Pro
released 2026-04-22
DeepSeek's flagship open-weight MoE. 1.6T parameters with 49B activated, 1M-token context, and a hybrid attention scheme (CSA + HCA) that delivers long-context inference at ~27% of V3.2's FLOPs.
- Context
- 1,000,000
- Params
- 1.6T (49B active)
- License
- MIT
- Source
- open
DeepSeek-V3.1
released 2025-08-21
Large MoE open-weight model. Predecessor to DeepSeek-V4.
- Context
- 128,000
- Params
- 671B
- License
- MIT
- Source
- open
DeepSeek-R1
released 2025-01-20
Reasoning-focused open-weight model.
- Context
- 128,000
- Params
- 671B
- License
- MIT
- Source
- open
Recent news
No tagged articles yet.