Akamai AI Grid

Akamai unveiled an AI orchestration layer to run inference across some 4,400 edge sites using Nvidia GPUs, aiming to reduce latency and cost for distributed AI workloads. The push is framed as an option for workloads that don’t need hyperscale central compute but do need geographically close inference. (edgeir.com)

Akamai is turning its edge network into a distributed artificial intelligence serving layer, routing inference jobs across more than 4,400 locations instead of sending them all to centralized cloud clusters. (akamai.com) The company said on March 16, 2026 that its Akamai Inference Cloud is the first global-scale implementation of Nvidia’s AI Grid reference design. Akamai said the system uses an orchestration layer to steer requests across edge, regional, and core infrastructure based on latency, cost, and performance. (akamai.com) Akamai is rolling out thousands of Nvidia RTX Pro 6000 Blackwell Server Edition graphics processing units across its network and pairing them with dedicated graphics processing unit clusters in core cloud sites. Data Center Dynamics reported the deployment spans Akamai’s 4,400 edge locations and is aimed at real-time inference rather than model training. (datacenterdynamics.com) Inference is the step where a trained model answers a prompt, classifies a video frame, or decides what to do next. Akamai launched Inference Cloud on October 28, 2025 as a service built to move that response step closer to users and devices at the edge of the internet. (prnewswire.com) That edge setup targets applications that lose value when every request has to travel back to a faraway data center. Akamai said examples include real-time video, physical artificial intelligence systems, fraud detection, secure payments, and highly personalized services with many simultaneous users. (prnewswire.com) Akamai is not arguing that every artificial intelligence workload belongs at the edge. The company said centralized “artificial intelligence factories” still make more sense for training and frontier-scale models, while its grid is meant for inference jobs that need local responsiveness. (telecomramblings.com) The selling point is what cloud providers call lower latency: less delay between a user request and the first token or action. Akamai said its control plane acts as a broker for requests and tries to improve cost per token, time to first token, and throughput by choosing where each job runs. (akamai.com) This push also extends Akamai’s shift from content delivery network operator into cloud and compute infrastructure. Data Center Dynamics reported the company initially targeted about 20 locations for its inference cloud launch in October 2025 before expanding toward the broader edge footprint it is now marketing. (datacenterdynamics.com) Edge Infrastructure Review reported on April 14, 2026 that Akamai is framing the grid as a way to improve response time and economics for distributed workloads that do not need hyperscale central compute. The next test is whether developers actually move production inference to thousands of smaller sites instead of keeping most traffic in big regional clouds. (edgeir.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.