Zyphra Cloud runs DeepSeek on AMD MI355X
- Zyphra launched Zyphra Cloud on May 4, with a serverless inference service serving DeepSeek V3.2 and other open-weight models on AMD MI355X GPUs. (morningstar.com) - The stack runs on TensorWave infrastructure and targets long-context, production workloads like agentic coding, deep research, and workflow automation. (morningstar.com) - It matters because MI355X is now showing up in real commercial inference, not just benchmarks or vendor demos. (morningstar.com)
AI cloud infrastructure is the story here — not just another model launch. The interesting part is that Zyphra says it has put DeepSeek V3.2 into a productio(morningstar.com)l gap in the market: lots of companies say their model works on non-Nvidia hardware, while far fewer actually sell it as a live service people can use today. Zyphra’s May 4 launch turns that claim into a commercial product. (morningstar.com) ### What actually launched? Zyphra launched(morningstar.com). The company frames it as a full-stack platform that bundles model serving, agent infrastructure, and scalable compute rather than just renting raw GPUs. (morningstar.com) ### Why is DeepSeek the headline model? DeepSeek matters because it has become one of the clearest stress tests for modern inference stacks. These models are big, memory-hungry, and often aimed at reasoning or long-context use cases, s(morningstar.com)-Exp and DeepSeek-R1, so Zyphra is plugging into a model family AMD already treats as strategic. (amd.com) ### Why does MI355X matter so much? The MI355X is AMD’s flagship AI accelerator in the MI350 series. The hardware pitch (morningstar.com) guarantee software maturity, but it does mean AMD finally has a part built to compete in the same conversation as Nvidia’s top inference hardware. (amd.com) ### So is this about hardware or software? Mostly software. The hard part in AI inference is not just owning fast chips — it is getting kernels, parallelism, scheduling, and memory movement tuned well enough that real workloads stay fast and cheap. Zyphr(amd.com)zes tuning on AMD Infinity Fabric for throughput and latency. Basically, the news is that someone is trying to sell the whole stack, not just the silicon. (finviz.com) ### Where does TensorWave fit? TensorWave is the infrastructure layer underneath this launch. Zyphra is the platform and software face; TensorWave p(amd.com) because alternative AI clouds have struggled with a chicken-and-egg problem — developers want mature tooling before they move, but tooling gets better only when somebody deploys real workloads at scale. (morningstar.com) ### Is this really different from a benchmark blog? Yes — that is the whole point. AMD already has benchmark material showing MI355X performance on models like DeepSeek-R1, GPT-OSS-120B, (finviz.com)ence endpoint serving named frontier models is messier and more important, because customers care about uptime, latency, and cost more than a vendor chart. (rocm.blogs.amd.com) ### What does this change? It does not mean Nvidia is suddenly in trouble. But it does mean the “AMD for production inference” story is getting more concrete. Zyphra had alr(morningstar.com)s to look less like one hardware lane and more like a real platform fight. (zyphra.com) ### Bottom line The important shift is not that DeepSeek can run on AMD — that was already known. The shift is that Zyphra is selling that setup as a live cloud service, on MI355X hardware, for production-style workloads. That is how an alternative stack stops being a lab project and starts becoming a market. (morningstar.com)nstinct-mi355x-gpus))