Gemma 3 27B
Google · released 2025-03-12 · Gemma license
A capable open multimodal 27B with great multilingual quality. The 16 KV-heads mean a heavier KV cache, so watch long-context VRAM.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 27B total |
| Architecture | dense |
| Context window | 131K tokens |
| Knowledge cutoff | 2024-08-01 |
| Modalities | text, image |
| Recommended backends | llama.cpp, Ollama, MLX, vLLM |
| Minimum viable rig | RTX 3090 / 4090 (24GB) at Q4_K_M |
Benchmark scores
| GPQA Diamond | 52% |
|---|---|
| SWE-bench Verified | 30% |
| AIME | 40% |
| MMLU-Pro | 67% |
| BFCL v3 (tool use) | 55% |
| Composite score | 4.93 |
| Community rating | 4.0★ (1 reviews, 0 net votes) |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 17 GB | 16 GB | 28 GB | 131K |
| Q8 | 29 GB | 28 GB | 40 GB | 131K |
| FP16 | 55 GB | 54 GB | 72 GB | 131K |
API pricing (per 1M tokens)
| Provider | Input | Output | Free tier |
|---|---|---|---|
| Together AI | $0.2 | $0.4 | No |
Strengths & weaknesses
Strengths: Excellent multilingual coverage; Native vision; Strong instruction following
Weaknesses: Weaker at hard math; Large KV cache (16 KV heads)