Zyphra launches MI355X cloud service

- Zyphra launched Zyphra Cloud on May 4, built on AMD Instinct MI355X GPUs via TensorWave, starting with a serverless inference product for open models. - The launch model list is unusually explicit: DeepSeek V3.2, Kimi K2.6, and GLM 5.1, with Zyphra pitching long-context, low-latency agent workloads. - It matters because AMD hardware is moving from raw GPU supply into usable AI cloud wrappers where software, uptime, and pricing decide winners.

AI cloud is not just about who has the fastest chip anymore. It is about who can turn that chip into a service developers will actually use — without making them fight drivers, provisioning, and weird deployment plumbing. That is the real news in Zyphra’s launch today. The company rolled out Zyphra Cloud, built on AMD’s Instinct MI355X GPUs through TensorWave infrastructure, and opened with a serverless inference product aimed at frontier open-weight models. ### What actually launched? Zyphra Cloud is the umbrella platform. The first live product inside it is Zyphra Inference — a serverless inference service for running open-weight models without reserving dedicated GPU boxes yourself. Zyphra says the stack combines model serving, agent infrastructure, and scalable compute in one platform, which is basically a pitch that this is more than a bare-metal rental business. ### Why is “serverless inference” the important part? Because most teams do not want to buy a pile of accelerators just to test or deploy one model endpoint. Serverless inference means the cloud provider handles capacity, scaling, and provisioning behind the scenes, and the customer pays for usage instead of babysitting. Zyphra is trying to sell convenience, not just silicon access. ### Which models is Zyphra putting front and center? The launch lineup matters because it tells you who the service is for. Zyphra is explicitly advertising DeepSeek V3.2, Kimi K2.6, and GLM 5.1, and its own site frames them around coding, multimodal agents, and long-horizon reasoning tasks. That is a strong hint that Zyphra wants developers building agent systems, research tools, and workflow automation — not just generic chatbots. ### Why use AMD MI355X here? The MI355X is AMD’s top-end Instinct part in the MI350 family, built on CDNA 4 with 288 GB of HBM3E memory, 8 TB/s of bandwidth, and support for low-precision formats like MXFP6 and MXFP4. In plain English, it is designed for dense AI and HPC workloads where memory capacity and throughput really matter. Big-context inference is one of the places where those specs stop being brochure fluff and start affecting cost and latency. ### Where does TensorWave fit? TensorWave is the infrastructure layer under this launch. Zyphra is not claiming it built every rack and cluster from scratch; it is building a cloud product on top of TensorWave’s AMD-focused infrastructure. That is important because it shows an emerging stack: AMD supplies the accelerator, TensorWave supplies the GPU cloud substrate, and Zyphra adds the developer-facing service layer. ### So is this really an AMD story? Partly, yes — but not in the usual benchmark-war way. AMD has already been trying to prove its chips can compete technically. The harder commercial step is getting those chips packaged into services that feel production-ready. Developers do not buy TFLOPS in isolation. They buy APIs, uptime, model availability, and pricing that makes sense at scale. Zyphra is one more sign AMD is getting that wrapper. ### What is the catch? The catch is that a launch announcement is not the same thing as broad market adoption. Nvidia still has the stronger software moat, the bigger ecosystem, and the default position in most AI deployments. So the question is not whether MI355X can run these models. It can. The question is whether Zyphra can make AMD-backed inference easy enough, cheap enough, and reliable enough that developers switch habits. That part takes time. ### Bottom line? This launch is a small but real shift in the AI cloud stack. Zyphra is taking AMD’s MI355X out of the “available hardware” bucket and putting it into the “usable product” bucket. And turns out that is where the next fight is — not just whose GPU is fastest, but whose cloud feels easiest to build on.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.