Llama 3.3 70B
Meta · Llama 3 family · released 2024-12-06 · Llama 3.3 Community license
Meta's efficient 70B; punches near 405B on instructions and math. ~48GB VRAM at 4-bit, ~140GB at BF16.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 70.6B total |
| Architecture | Dense decoder-only Transformer with GQA |
| Context window | 131K tokens |
| Knowledge cutoff | 2023-12-01 |
| Modalities | text |
| Recommended backends | — |
| Minimum viable rig | ~48GB VRAM at 4-bit (bitsandbytes); ~140GB BF16 |
Benchmark scores
| GPQA Diamond | 50.5% |
|---|---|
| SWE-bench Verified | — |
| AIME | — |
| MMLU-Pro | 68.9% |
| BFCL v3 (tool use) | 77.3% |
| Composite score | 7.06 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 43 GB | 42 GB | 48 GB | 131K |
Strengths & weaknesses
Strengths: Excellent instruction-following (IFEval 92.1, beats Llama 3.1 405B); Near-405B quality at ~1/6 the params on several tasks; Native 128k context + GQA; huge ecosystem support
Weaknesses: Text-only (no vision); Officially only 8 languages; No reasoning mode; trails dedicated reasoners on GPQA