Gpt Oss 120b

openai · released 2025-08-04 · apache-2.0 license

An open MoE that reasons well with only 5.1B active params, so it is fast once it fits. Getting it to fit (~63GB at 4-bit) is the hard part.

Key specs

TypeLocal open-weight
Parameters120.41B total · MoE, 5.1B active
Architecturegpt_oss
Context window131K tokens
Knowledge cutoff2025-04-01
Modalitiestext
Recommended backendsvLLM, llama.cpp
Minimum viable rig80GB card / DGX Spark / M-series 96GB+

Benchmark scores

GPQA Diamond72%
SWE-bench Verified58%
AIME90%
MMLU-Pro80%
BFCL v3 (tool use)72%
Composite score6.79
Community ratingNo reviews yet

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q8120 GB118 GB128 GB131K
Q4_K_M71.3 GB69.8 GB80 GB131K

API pricing (per 1M tokens)

ProviderInputOutputFree tier
OpenRouter$0.1$0.5Yes
Groq$0.15$0.6No
Fireworks AI$0.15$0.6No

Strengths & weaknesses

Strengths: Native MXFP4 weights fit ~63GB; Strong agentic/reasoning; Only 5.1B active params

Weaknesses: Needs a 64GB+ card or unified memory; Weaker multilingual