DigitalOcean adds 25+ models
- DigitalOcean said April 28 it launched an Inference Engine that bundles serverless, dedicated, batch, and routed AI inference into one production platform. - The company said customers have seen up to 67% lower inference costs, with batch jobs priced 50% lower and router users cutting spend 40%. - The release extends DigitalOcean’s AI push beyond hosting into model routing and multimodal inference. (digitalocean.com)
An inference engine is the layer that takes an AI request and decides which model should answer it, where it should run, and what it should cost. DigitalOcean said on April 28 it is now selling that layer as a product. (digitalocean.com) DigitalOcean’s new Inference Engine combines four pieces: Inference Router, Batch Inference, Serverless Inference, and Dedicated Inference. The company said the goal is to let developers run, scale, and optimize AI workloads without piecing together separate vendors. (digitalocean.com) Serverless inference is the simplest version: send a request to a model through an application programming interface, or API, and pay only when it runs. DigitalOcean said its serverless tier gives developers a single API key for dozens of models and adds what it called off-peak pricing. (digitalocean.com 1) (digitalocean.com 2) Dedicated inference is the opposite tradeoff: reserve graphics processing unit capacity so response times stay predictable under heavy use. DigitalOcean said that option is aimed at sustained, high-scale workloads that do not fit shared infrastructure. (digitalocean.com 1) (digitalocean.com 2) The new piece is the router. DigitalOcean said its router uses a mixture-of-experts model to sort requests by task and priority, then send each one to a cheaper or faster model instead of defaulting to the most expensive option every time. (digitalocean.com) DigitalOcean attached concrete savings claims to that pitch. It said customers have reported up to 67% lower inference costs overall, batch inference cuts offline jobs by 50%, and legal technology startup LawVo reduced inference costs by more than 40% with routing. (digitalocean.com) The model catalog is broader than the headline suggests. DigitalOcean’s documentation says its AI platform offers both open-source and commercial foundation models through DigitalOcean API keys, and also lets customers bring their own provider keys for commercial models. (digitalocean.com) Bring Your Own Model, or BYOM, is a separate path for teams that want to import their own weights instead of calling a hosted model. DigitalOcean’s docs say those imports can come from Hugging Face or a Spaces bucket, but support is limited to Safetensors files and dedicated-inference-compatible architectures including Qwen2ForCausalLM and Qwen3ForCausalLM. (digitalocean.com) The product also reaches beyond chatbots. DigitalOcean’s inference docs say its multimodal models can process or generate text, images, audio, and video, including vision-language tasks, text-to-speech, and text-to-video generation. (digitalocean.com 1) (digitalocean.com 2) This launch lands as DigitalOcean is recasting itself around AI infrastructure. In April it bought Katanemo Labs, and on April 28 it separately unveiled what it called an AI-Native Cloud built for inference and agentic workloads. (businesswire.com) (digitalocean.com) The bet is that smaller developers want one control plane for AI the way they once wanted one dashboard for virtual machines and databases. DigitalOcean is now trying to make model choice, pricing, and capacity look like another cloud setting instead of a separate engineering project. (digitalocean.com) (digitalocean.com)