Mistral Small 3.2
Mistral · released 2025-06-20 · Apache-2.0 license
A fast, cheap 24B with vision and great tool-calling. GQA (8 KV heads) keeps the KV cache small, so long context fits comfortably.
Key specs
| Type | Local open-weight |
|---|---|
| Parameters | 24B total |
| Architecture | dense |
| Context window | 131K tokens |
| Knowledge cutoff | 2025-03-01 |
| Modalities | text, image |
| Recommended backends | llama.cpp, vLLM, Ollama, MLX |
| Minimum viable rig | RTX 3060 12GB (Q4) / 16GB for headroom |
Benchmark scores
| GPQA Diamond | 50% |
|---|---|
| SWE-bench Verified | 36% |
| AIME | 48% |
| MMLU-Pro | 70% |
| BFCL v3 (tool use) | 62% |
| Composite score | 5.2 |
| Community rating | 5.0★ (1 reviews, 0 net votes) |
VRAM & disk per quantization
| Quant | VRAM | Disk | RAM | Context |
|---|---|---|---|---|
| Q4_K_M | 15 GB | 14 GB | 24 GB | 131K |
| Q8 | 26 GB | 25 GB | 40 GB | 131K |
| FP16 | 49 GB | 48 GB | 64 GB | 131K |
API pricing (per 1M tokens)
| Provider | Input | Output | Free tier |
|---|---|---|---|
| Mistral | $0.1 | $0.3 | Yes |
| Together AI | $0.2 | $0.6 | No |
Strengths & weaknesses
Strengths: Very fast & cheap to run; Strong function-calling; Native vision
Weaknesses: Weaker at the hardest reasoning; Smaller knowledge base