Skip to content
All tags

#mobilellm

1 posts
ai guide

Small Models That Run on Phones: Choices and Constraints in 2026

The main on-device LLMs in 2026 are Gemma 3n, Qwen 3.5 Small, Llama 3.2, Phi-4-mini, Ministral 3, and SmolLM3. Sub-3B quantized models can hit 30-50 tokens/sec on phones with 8GB RAM, but RAM, thermal throttling, and context window remain hard constraints.