Llama 3.3 70B

Meta · Llama 3 family · released 2024-12-06 · Llama 3.3 Community license

Meta's efficient 70B; punches near 405B on instructions and math. ~48GB VRAM at 4-bit, ~140GB at BF16.

Key specs

TypeLocal open-weight
Parameters70.6B total
ArchitectureDense decoder-only Transformer with GQA
Context window131K tokens
Knowledge cutoff2023-12-01
Modalitiestext
Recommended backends
Minimum viable rig~48GB VRAM at 4-bit (bitsandbytes); ~140GB BF16

Benchmark scores

GPQA Diamond50.5%
SWE-bench Verified
AIME
MMLU-Pro68.9%
BFCL v3 (tool use)77.3%
Composite score7.06
Community ratingNo reviews yet

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q4_K_M43 GB42 GB48 GB131K

Strengths & weaknesses

Strengths: Excellent instruction-following (IFEval 92.1, beats Llama 3.1 405B); Near-405B quality at ~1/6 the params on several tasks; Native 128k context + GQA; huge ecosystem support

Weaknesses: Text-only (no vision); Officially only 8 languages; No reasoning mode; trails dedicated reasoners on GPQA