Mistral Small 3.2

Mistral · released 2025-06-20 · Apache-2.0 license

A fast, cheap 24B with vision and great tool-calling. GQA (8 KV heads) keeps the KV cache small, so long context fits comfortably.

Key specs

TypeLocal open-weight
Parameters24B total
Architecturedense
Context window131K tokens
Knowledge cutoff2025-03-01
Modalitiestext, image
Recommended backendsllama.cpp, vLLM, Ollama, MLX
Minimum viable rigRTX 3060 12GB (Q4) / 16GB for headroom

Benchmark scores

GPQA Diamond50%
SWE-bench Verified36%
AIME48%
MMLU-Pro70%
BFCL v3 (tool use)62%
Composite score5.2
Community rating5.0★ (1 reviews, 0 net votes)

VRAM & disk per quantization

QuantVRAMDiskRAMContext
Q4_K_M15 GB14 GB24 GB131K
Q826 GB25 GB40 GB131K
FP1649 GB48 GB64 GB131K

API pricing (per 1M tokens)

ProviderInputOutputFree tier
Mistral$0.1$0.3Yes
Together AI$0.2$0.6No

Strengths & weaknesses

Strengths: Very fast & cheap to run; Strong function-calling; Native vision

Weaknesses: Weaker at the hardest reasoning; Smaller knowledge base