Cheap VPS beats DIY rigs
Some practitioners recommend renting a cheap $8/month Hetzner VPS and calling a cloud API (e.g., for Opus 4.6) rather than building an expensive local setup that costs hundreds in hardware and time (x.com). The pitch is practical: lower upfront cost and faster access to better models without the VRAM and maintenance burdens of local machines (x.com).
The pitch making the rounds is simple: rent a low-cost virtual private server and buy model calls by the token instead of buying a local artificial intelligence box. (hetzner.com) A virtual private server is just a rented Linux machine in a data center, and Hetzner’s CX22 plan starts at €4.51 a month with 2 virtual central processing units, 4 gigabytes of random-access memory, and 40 gigabytes of storage. (hetzner.com) The argument is that the rented server handles the coding environment, automation tools, and file syncing, while the heavy model inference runs through an application programming interface from a provider such as Anthropic or through a broker such as OpenRouter. (anthropic.com ) (openrouter.ai) That changes the upfront math. An NVIDIA GeForce RTX 4090 launched at $1,599 and carries 24 gigabytes of memory, while many larger open-weight models need far more memory than that to run at full precision. (nvidia.com) (huggingface.co) Hugging Face’s Llama 3.1 guide says the 70 billion parameter model needs about 140 gigabytes of memory for full-precision inference, and the 405 billion parameter version needs more than 800 gigabytes. (huggingface.co) That is why local setups usually lean on quantization, which shrinks a model the way a compressed video file shrinks a movie, trading some quality or flexibility for lower memory use. Ollama says 70 billion parameter models generally require at least 64 gigabytes of memory. (ollama.com) The cloud route swaps hardware limits for usage billing. OpenRouter lists Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens, which lets a developer pay for bursts of use instead of owning an idle graphics card. (openrouter.ai) That does not make local rigs obsolete. A local model can be cheaper for heavy daily workloads, can keep sensitive data on-device, and does not depend on an outside provider’s rate limits or outages. (ollama.com) (anthropic.com) The tradeoff is that a cheap virtual private server plus an application programming interface gives small teams access to frontier models on day one, while a do-it-yourself machine still has to fit the model into memory, stay cooled, and stay maintained. (hetzner.com) (nvidia.com)