Kloia offers elastic GPU service

Kloia announced a GPU‑as‑a‑Service product running on OpenShift and AWS intended to give elastic AI infrastructure and avoid fixed‑capacity overruns. (x.com) The offering frames cloud‑native OpenShift deployments as a route to scale GPU access without committing to long‑term hardware. (x.com)

Renting graphics processing units is the cloud version of leasing heavy machinery, and Kloia says it now has a service to do that on demand with Red Hat OpenShift AI on Amazon Web Services. (kloia.com) Kloia described the offer in a March 24, 2026 post as an enterprise “GPU-as-a-Service” architecture built on Amazon Web Services, with separate machine sets for NVIDIA A100, H100 and H200 graphics processing unit instances. (kloia.com) The company said the setup is meant to stop teams from buying dedicated graphics processing unit capacity that sits idle, then filing tickets when another team needs the same hardware. Kloia said users can request graphics processing units through Red Hat OpenShift AI workbenches, training jobs and model-serving endpoints instead. (kloia.com) The basic problem is cost and scarcity. Red Hat said in a November 10, 2025 post that graphics processors are expensive, often underused, and harder to scale when each team controls its own pool. (redhat.com) Red Hat’s answer is to treat graphics processors like a shared utility inside Kubernetes, the software layer that schedules containers across servers. Its OpenShift AI platform uses queueing and autoscaling tools so jobs can wait, start and release hardware as demand changes. (redhat.com) Kloia’s version runs on Red Hat OpenShift Service on Amazon Web Services, a managed offering that Amazon says supports pay-as-you-go hourly or annual billing on a single Amazon Web Services invoice. Amazon says the service is jointly supported by Red Hat and Amazon Web Services. (aws.amazon.com) Kloia said its reference design spreads the OpenShift control plane across three master nodes in three Amazon Web Services availability zones in the United Arab Emirates region, while graphics processing unit worker nodes scale separately by chip type. The company said that layout is meant to keep the management layer running if one availability zone fails. (kloia.com) Inside the cluster, Kloia said inference endpoints expose OpenAI-compatible application programming interfaces, so an internal app can call one standard interface without knowing which model runtime is behind it. The company also said it uses a centralized model registry to track versions and deployments. (kloia.com) The timing lines up with changes in Red Hat’s own platform. Red Hat said on March 25, 2026 that Dynamic Resource Allocation became generally available in OpenShift 4.21, adding a newer way to request accelerators by attributes such as model or memory instead of only by count. (developers.redhat.com) That matters for mixed fleets like Kloia’s A100, H100 and H200 pool, where a training run may need a specific class of chip rather than any available graphics processor. Red Hat said older device plug-ins could not express those differences cleanly or help the cluster autoscaler reason about them. (developers.redhat.com) Kloia is pitching the service as part of a broader Amazon Web Services and artificial intelligence practice. The company says it has operated since 2015, works with customers in 19 countries, and holds Amazon Web Services Premier Tier Services Partner status. (kloia.com 1) (kloia.com 2) The immediate test is whether customers buy shared access instead of reserving fixed graphics processing unit capacity up front. Kloia’s pitch is that enterprises can keep the chips busy, keep billing variable, and avoid owning more graphics processors than their busiest week requires. (kloia.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.