Gpt Oss 120b
openai · released 2025-08-04 · apache-2.0 license
An open MoE that reasons well with only 5.1B active params, so it is fast once it fits. Getting it to fit (~63GB at 4-bit) is the hard part.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 120.41B total · MoE, 5.1B active |
| Architecture | gpt_oss |
| Context window | 131K tokens |
| Knowledge cutoff | 2025-04-01 |
| Modalities | text |
| Recommended backends | vLLM, llama.cpp |
| Minimum viable rig | 80GB card / DGX Spark / M-series 96GB+ |
Benchmark scores
| GPQA Diamond | 72% |
|---|---|
| SWE-bench Verified | 58% |
| AIME | 90% |
| MMLU-Pro | 80% |
| BFCL v3 (tool use) | 72% |
| Composite score | 6.79 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q8 | 120 GB | 118 GB | 128 GB | 131K |
| Q4_K_M | 71.3 GB | 69.8 GB | 80 GB | 131K |
API pricing (per 1M tokens)
| Provider | Input | Output | Free tier |
|---|---|---|---|
| OpenRouter | $0.1 | $0.5 | Yes |
| Groq | $0.15 | $0.6 | No |
| Fireworks AI | $0.15 | $0.6 | No |
Strengths & weaknesses
Strengths: Native MXFP4 weights fit ~63GB; Strong agentic/reasoning; Only 5.1B active params
Weaknesses: Needs a 64GB+ card or unified memory; Weaker multilingual