Run a private LLM inside VS Code

A new guide shows how to integrate local LLMs (like via Ollama) into VS Code so code, prompts, and completions never leave the developer's machine — a practical privacy shift for developer tooling. Local integrations change the tradeoffs interviewers may probe around privacy, latency, and operational cost. (thenewstack.io)

The New Stack how‑to was published April 1, 2026 and credited to Jack Wallen. (thenewstack.io) The "VSCode Ollama" extension by warm3snow shows roughly 38,815 installs on the Visual Studio Marketplace and its source is hosted on GitHub under an MIT license. (marketplace.visualstudio.com) Ollama runs as a background service and exposes a local REST API bound to by default for completions, chat, and embeddings. (docs.ollama.com) Ollama’s VS Code integration can be wired via the Copilot Chat model picker (Add Models → Ollama) or by using community extensions such as Continue and Ollama Agent that direct completions to the local Ollama endpoint. (docs.ollama.com) Ollama’s public model library lists many images developers can pull locally, including Code Llama 7B (~3.8GB), Mistral 7B (~4.1GB), Llama 2 70B (~39GB), and coding-specialized variants like Code Llama and Qwen families. (ollama-operator.ayaka.io) Several library entries explicitly include model image sizes and tags—for example Mixtral builds listed at ~26GB and ~80GB and Llama2 70B at ~39GB—indicating the disk and memory footprint needed to run larger local models. (ollama-operator.ayaka.io) Marketplace listings for local tools advertise feature parity with cloud copilots: the Ollama Agent extension highlights AI completions, Quick Edit, Code Review, Undo/Redo, and a snippets library while operating entirely on a developer’s machine. (marketplace.visualstudio.com)

Run a private LLM inside VS Code

Get your own daily briefing