Qwen3 30B A3B
Qwen · Qwen3 family · released 2025-04-27 · apache-2.0 license
A tiny-active (3.3B) MoE that reasons well above its weight. Fits a 24GB card at Q4; toggle thinking mode for hard math/code.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 30.53B total · MoE, 3.3B active |
| Architecture | qwen3_moe |
| Context window | 41K tokens |
| Knowledge cutoff | — |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | — |
Benchmark scores
| GPQA Diamond | 65.8% |
|---|---|
| SWE-bench Verified | — |
| AIME | 70.9% |
| MMLU-Pro | 78.5% |
| BFCL v3 (tool use) | 69.1% |
| Composite score | 7.26 |
| Community rating | 2.5★ (2 reviews, 2 net votes) |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 19.2 GB | 17.7 GB | 32 GB | 41K |
Strengths & weaknesses
Strengths: Only 3.3B active params — very efficient for its quality; Strong reasoning/coding in thinking mode for the active size; Hybrid thinking / non-thinking switch
Weaknesses: Weaker complex multi-step agentic tool-use vs frontier; Static YaRN can degrade short-context if force-enabled; No vision