122B model runs locally

A social post reports an open‑source 122B model running Claude Code locally on a MacBook for free, claiming no API limits and unlimited local development use. The post suggests this setup enables local dev workflows outside paid APIs (X/Twitter) (x.com).

A 122 billion-parameter open model is being pitched as a way to run Anthropic’s Claude Code locally on a Mac, without sending prompts to a paid application programming interface. (github.com) (code.claude.com) Claude Code is Anthropic’s coding tool for the terminal, and Anthropic’s own setup page says it requires an internet connection to run in its standard form. A new crop of third-party tools instead exposes an Anthropic-compatible “Messages” endpoint on a local machine, so Claude Code can talk to a local model as if it were a remote service. (code.claude.com) (docs.ollama.com) That compatibility layer is now documented by multiple vendors. Ollama says its Anthropic-compatible application programming interface can connect “tools like Claude Code,” and its Claude Code integration page lists open models including Qwen 3.5. (docs.ollama.com 1) (docs.ollama.com 2) The specific repository tied to the social post says it runs “Claude Code 100% on-device” on Apple Silicon and lists three local model options: Gemma 4 31B, Llama 3.3 70B, and Qwen 3.5 122B A10B. Its README says the 122B option uses a Mixture of Experts design, meaning only about 10 billion parameters are active for each token, which cuts the amount of work the Mac has to do. (github.com) (huggingface.co) That design detail is the reason a “122B” model can fit into a consumer machine at all. The Hugging Face model card for one Apple Silicon build of Qwen3.5-122B-A10B says the base model has 122 billion total parameters but 10 billion active, while the GitHub project claims roughly 75 gigabytes of memory and 65 gigabytes of disk for its 4-bit build. (huggingface.co) (github.com) The performance and hardware claims are narrower than the social post makes them sound. The repository advertises 41 to 65 tokens per second for its Qwen setup and says a 96 gigabyte Mac is the minimum for that launcher, while another Mac-native inference server, oMLX, says 64 gigabytes or more is the “sweet spot” for larger local coding models and publishes its own benchmarks on a 512 gigabyte M3 Ultra. (github.com) (omlx.ai) The “free” part also needs qualification. Running locally can eliminate per-token charges from Anthropic’s application programming interface, but Claude Code itself is still Anthropic software, and Anthropic’s setup and platform docs describe it as an internet-connected product with account-based access rather than a fully detached open-source tool. (code.claude.com) (platform.claude.com) Developers have been moving in this direction for months as local model runners got better at mimicking cloud interfaces. A January 2026 guide noted that Ollama version 0.14.0 added Anthropic Messages compatibility, removing some of the older proxy hacks that early adopters used to wire Claude Code into local models. (dev.to) (docs.ollama.com) There is still a tradeoff between privacy, cost, and reliability. Local setups keep code on the device and avoid usage caps from remote model providers, but they depend on large downloads, enough unified memory, and third-party compatibility layers that Anthropic does not control. (github.com) (omlx.ai) (code.claude.com) What the post captures accurately is the shift in plumbing: Claude Code no longer has to mean Claude’s own hosted models. With Anthropic-style endpoints now showing up in local runners, the terminal tool can be pointed at open models on a Mac instead of a metered cloud backend. (docs.ollama.com) (omlx.ai)

122B model runs locally

Get your own daily briefing