Qwen3 32B
Qwen · released 2025-04-27 · apache-2.0 license
A sweet-spot 32B you can run on a single 24GB card at Q4. Strong at coding and math; keep prompts under ~64k for best quality.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 32.76B total |
| Architecture | qwen3 |
| Context window | 41K tokens |
| Knowledge cutoff | 2025-02-01 |
| Modalities | text |
| Recommended backends | llama.cpp, vLLM, Ollama, MLX |
| Minimum viable rig | RTX 3090 / 4090 (24GB) at Q4_K_M |
Benchmark scores
| GPQA Diamond | 66% |
|---|---|
| SWE-bench Verified | 50% |
| AIME | 81% |
| MMLU-Pro | 78% |
| BFCL v3 (tool use) | 68% |
| Composite score | 6.28 |
| Community rating | 4.7★ (3 reviews, 1 net votes) |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q8 | 35 GB | 34 GB | 48 GB | 131K |
| FP16 | 66 GB | 64 GB | 80 GB | 131K |
| Q4_K_M | 20.5 GB | 19 GB | 32 GB | 41K |
API pricing (per 1M tokens)
| Provider | Input | Output | Free tier |
|---|---|---|---|
| SiliconFlow | $0.1 | $0.3 | No |
Strengths & weaknesses
Strengths: Strong reasoning & math for a 32B; Toggleable thinking mode; Runs on a single 24GB card at Q4
Weaknesses: Context quality thins past ~64k; No vision