Small Models That Run on Phones: Choices and Constraints in 2026
The main on-device LLMs in 2026 are Gemma 3n, Qwen 3.5 Small, Llama 3.2, Phi-4-mini, Ministral 3, and SmolLM3. Sub-3B quantized models can hit 30-50 tokens/sec on phones with 8GB RAM, but RAM, thermal throttling, and context window remain hard constraints.