Cloud vs on‑prem: placement

Recent reporting and briefs argue the decision to use cloud or on‑prem is really a workload placement choice: keep deterministic, proximity‑sensitive execution in colo/on‑prem and put research, training and non‑real‑time work in cloud. That split is being driven not just by latency but by supply, security and policy considerations as compute becomes a strategic asset. (domain-b.com)

The old cloud argument was simple. Rent compute when you need it, buy servers when you do not trust anyone else. AI broke that frame. The real question now is not cloud or on‑prem in the abstract. It is which workload goes where, and why. Recent industry briefs have started saying that plainly: use the platform that fits the job, because latency, sovereignty, resilience, and intellectual property matter as much as raw price. (domain-b.com) That shift is happening because AI workloads are not all alike. Training a model, fine‑tuning it, batch processing a mountain of documents, and serving a live inference request are different technical problems. Training wants huge bursts of compute and can often tolerate delay. Production inference is the opposite. It needs predictable response times, stable data paths, and tight control over where sensitive information moves. That is why more organizations are treating cloud as a place for research, experimentation, and elastic capacity, while keeping deterministic execution closer to their own systems in private data centers or colocation sites. (deloitte.com) Latency is the obvious reason, but it is no longer the only one, or even always the main one. Many enterprise AI systems sit next to operational databases, factory systems, call centers, hospital networks, or trading infrastructure. Moving those data flows out to a distant public cloud can add delay, cost, and risk all at once. Inference also tends to run continuously, not in occasional bursts, so the economics change. A cloud bill is easy to justify for an experiment. It is harder to justify for a permanent, high‑volume service that never sleeps. (csquare.com) Then there is the supply problem. AI compute is no longer just another IT purchase. It is a constrained industrial resource, tied to chips, power, cooling, and construction schedules. Domain‑b’s report says global AI spending is projected to exceed $2 trillion in 2026, with long‑term infrastructure investment potentially reaching $3 trillion to $4 trillion by the end of the decade. Nvidia’s Jensen Huang has made the same larger point in public: the industry is headed toward trillions in infrastructure spend. When compute becomes scarce, placement stops being a software architecture debate and starts looking like capacity triage. (domain-b.com) That scarcity is why colocation has moved back into the center of the story. Colo used to sound like a compromise from an earlier era. Now it looks like a practical answer to a very current problem. Enterprises want physical control, cloud on‑ramps, dense connectivity, and faster access to power than they can get by building a facility from scratch. Colocation offers a middle ground: closer to owned infrastructure than public cloud, but less rigid than a fully self‑built site. For inference systems that must stay near users or regulated data, that middle ground matters. (csquare.com) Security and policy push in the same direction. Sovereignty rules are no longer niche concerns for governments. They are shaping mainstream AI design. Microsoft’s guidance for sovereign AI workloads says controls have to cover the full lifecycle, from training through inference and retirement. Google and IBM are both selling sovereign cloud offerings built around residency, administrative control, and isolated environments. That does not eliminate cloud. It changes what kind of cloud is acceptable, and which parts of the pipeline can leave home at all. (learn.microsoft.com) The biggest infrastructure projects make the split even clearer. OpenAI, Oracle, and SoftBank said in September 2025 that Stargate had expanded with new US data center sites and a planned $500 billion, 10‑gigawatt commitment. That is cloud at industrial scale. But the same buildout also underlines the opposite truth: if frontier training is concentrating into giant power‑hungry campuses, then many production workloads will have to be placed elsewhere, nearer to the businesses and institutions that actually use the models. One side of AI is becoming centralized. The other is spreading into colo cages, enterprise racks, and tightly governed private environments. (openai.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.