Llama 3.3 70B

Meta · Llama 3 family · released 2024-12-06 · Llama 3.3 Community license

Meta's efficient 70B; punches near 405B on instructions and math. ~48GB VRAM at 4-bit, ~140GB at BF16.

Key specs

Type	Local open-weight
Parameters	70.6B total
Architecture	Dense decoder-only Transformer with GQA
Context window	131K tokens
Knowledge cutoff	2023-12-01
Modalities	text
Recommended backends	—
Minimum viable rig	~48GB VRAM at 4-bit (bitsandbytes); ~140GB BF16

Benchmark scores

GPQA Diamond	50.5%
SWE-bench Verified	—
AIME	—
MMLU-Pro	68.9%
BFCL v3 (tool use)	77.3%
Composite score	7.06
Community rating	No reviews yet

VRAM & disk per quantization

Quant	VRAM	Disk	RAM	Context
Q4_K_M	43 GB	42 GB	48 GB	131K

Strengths & weaknesses

Strengths: Excellent instruction-following (IFEval 92.1, beats Llama 3.1 405B); Near-405B quality at ~1/6 the params on several tasks; Native 128k context + GQA; huge ecosystem support

Weaknesses: Text-only (no vision); Officially only 8 languages; No reasoning mode; trails dedicated reasoners on GPQA