Qwen3 32B

Qwen · released 2025-04-27 · apache-2.0 license

A sweet-spot 32B you can run on a single 24GB card at Q4. Strong at coding and math; keep prompts under ~64k for best quality.

Key specs

TypeLocal open-weight
Parameters32.76B total
Architectureqwen3
Context window41K tokens
Knowledge cutoff2025-02-01
Modalitiestext
Recommended backendsllama.cpp, vLLM, Ollama, MLX
Minimum viable rigRTX 3090 / 4090 (24GB) at Q4_K_M

Benchmark scores

GPQA Diamond66%
SWE-bench Verified50%
AIME81%
MMLU-Pro78%
BFCL v3 (tool use)68%
Composite score6.28
Community rating4.7★ (3 reviews, 1 net votes)

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q835 GB34 GB48 GB131K
FP1666 GB64 GB80 GB131K
Q4_K_M20.5 GB19 GB32 GB41K

API pricing (per 1M tokens)

ProviderInputOutputFree tier
SiliconFlow$0.1$0.3No

Strengths & weaknesses

Strengths: Strong reasoning & math for a 32B; Toggleable thinking mode; Runs on a single 24GB card at Q4

Weaknesses: Context quality thins past ~64k; No vision