fireworks/models/deepseek-v3-0324

Common Name: Deepseek V3 03-24

Released on Oct 16, 2025 12:00 AMSupportedTool Invocation

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek. Updated checkpoint.

Specifications

Context

160K

Inputtext

Outputtext

Performance (7-day Average)

Collecting…

Pricing

Input$0.99/MTokens

Output$0.99/MTokens

Availability Trend (24h)

Performance Metrics (24h)

Similar Models

GLM-5

$1.10/$3.52/M

ctx203Kmax—avail—tps—

InOutCap

Z.ai's state-of-the-art mixture-of-experts model with 40B active parameters out of 744B total. Optimized for complex systems engineering and long-horizon agentic tasks, using Deepseek Sparse Attention for efficient long-context processing.

Kimi K2 Instruct 0905

$0.66/$2.75/M

ctx256Kmax—avail—tps—

InOutCap

Kimi K2 0905 is an updated version of Kimi K2, a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Kimi K2 0905 has improved coding abilities, a longer context window, and agentic tool use, and a longer (262K) context window.

Kimi K2.5

$0.66/$3.30/M

ctx262Kmax—avail—tps—

InOutCap

Kimi K2.5 is Moonshot AI's flagship agentic model and a new SOTA open model. It unifies vision and text, thinking and non-thinking modes, and single-agent and multi-agent execution into one model. Kimi K2.5 is a mixture-of-experts (MoE) language model with 1 trillion total parameters and a 262K context window.

DeepSeek V3.1

$0.62/$1.85/M

ctx160Kmax—avail—tps—

InOutCap

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.