Ollama tightens tool-calling and cloud

Ollama released v0.20.6 with improved Gemma 4 tool‑calling that enables local runs and integrations like OpenClaw and Claude Code, plus new cloud options for immediate testing. The company also expanded cloud capabilities for models such as GLM‑5.1 and Gemma 4 to meet rising demand from AI coding tool usage. (x.com) (x.com)

Ollama is pushing its software further into coding assistants, pairing local tool use with more cloud-hosted models for developers who need bigger context windows. (ollama.com) The company said on January 16, 2026 that Ollama v0.14.0 and later works with Anthropic’s Messages application programming interface, which lets tools such as Claude Code run against open models through Ollama. Ollama’s docs now list cloud options including GLM-5:cloud, Kimi-K2.5:cloud and MiniMax-M2.7:cloud for that setup. (ollama.com) (docs.ollama.com) On January 23, 2026, Ollama added `ollama launch`, a one-command setup for Claude Code, OpenCode and Codex with either local or cloud models. The post said coding tools work best with at least 64,000 tokens of context, and that Ollama’s cloud runs models at full context length with a five-hour coding session window. (ollama.com) Tool calling is the feature that lets a model ask software to do work outside the chat window, such as reading files, searching the web or running commands. Ollama added general tool support in 2024 and streaming tool calls in May 2025, turning its local model runner into a base layer for agent-style apps. (ollama.com 1) (ollama.com 2) That helps explain the focus on Gemma 4 and GLM-5.1. Ollama’s model pages tag both as tool-capable, and both are available in cloud form, which gives developers a way to test agent workflows immediately even when a laptop cannot hold the larger model in memory. (ollama.com 1) (ollama.com 2) (ollama.com 3) Gemma 4 is Google’s open model family, and Ollama’s library page says its 31 billion parameter cloud version and smaller edge variants are aimed at reasoning, coding and agentic workflows. The same page says Gemma 4 adds native system prompt support, which gives developers tighter control over instructions before any tool call happens. (ollama.com) GLM-5.1 is positioned even more directly at coding agents. Ollama’s library page says the model is built for “agentic engineering,” supports tools and cloud deployment, and is meant to stay effective over “hundreds of rounds and thousands of tool calls” in longer sessions. (ollama.com) Ollama is also wiring those models into OpenClaw, its integration for assistants that bridge messaging apps to coding agents. Ollama’s OpenClaw documentation says the setup can connect services including WhatsApp, Telegram, Slack, Discord and iMessage, then install a gateway daemon plus web search and fetch plugins around a selected local or cloud model. (docs.ollama.com) The cloud piece is newer than the local runner. Ollama introduced cloud models in preview on September 19, 2025, saying they use datacenter hardware for models that would not fit on a personal computer while keeping the same command-line interface developers already use locally. (ollama.com) The result is a product that now spans both ends of the market: run an open model on your own machine when it fits, or swap to a hosted one when coding tools need more memory, longer context and steadier tool use. (ollama.com 1) (ollama.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.