Phi 4
microsoft · released 2024-12-11 · mit license
A 14B that runs on a 12GB card and excels at math/reasoning. Short 16k context is the main limitation.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 14.66B total |
| Architecture | phi3 |
| Context window | 16K tokens |
| Knowledge cutoff | 2024-06-01 |
| Modalities | text |
| Recommended backends | llama.cpp, Ollama, vLLM |
| Minimum viable rig | RTX 3060 12GB at Q4_K_M |
Benchmark scores
| GPQA Diamond | 56% |
|---|---|
| SWE-bench Verified | 28% |
| AIME | 75% |
| MMLU-Pro | 71% |
| BFCL v3 (tool use) | 50% |
| Composite score | 5.16 |
| Community rating | No reviews yet |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q8 | 16 GB | 15 GB | 24 GB | 16K |
| FP16 | 29 GB | 28 GB | 40 GB | 16K |
| Q4_K_M | 10 GB | 8.5 GB | 16 GB | 16K |
Strengths & weaknesses
Strengths: Punches above its weight on math; Tiny VRAM footprint
Weaknesses: Only 16k native context; Weaker multilingual