For side projects, toy demos, and RAG prototypes, nobody wants to swipe a credit card on day one. This is a verified roundup of 40+ LLM inference providers still operating as of 2026/05, tiered by whether free resources auto-replenish or are one-time grants. Each entry notes credit-card requirements, supported models, paid starting prices, and catches. Chinese-origin providers including Zhipu GLM (permanently free), Doubao (2M tokens/day), Kimi, DashScope, and the Ollama local option are all included.
OpenCode is a free, open-source CLI agent written in Go with 95K+ GitHub stars. It supports 75+ model providers including local Ollama, allows authentication via Copilot/ChatGPT accounts, and lets you switch models mid-session without losing context.
2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.
Pi is a minimalist coding agent built in TypeScript by Mario Zechner, featuring just 4 core tools (read, write, edit, bash) and a 300-word system prompt. It's extensible via Extensions, Skills, and Prompt Templates, runs on the Bun runtime, and ships with built-in Ollama support via `ollama launch pi`.
Ollama wraps llama.cpp in a Docker-style CLI + REST API, letting you run LLMs locally with a single command. This post covers core concepts, installation, API, hardware requirements, Modelfile customization, and what this tool is — and isn't — good for.