Intel and Google expand AI CPU work
Intel and Google are expanding work on AI‑optimized CPUs, a sign that major cloud and chip players are betting some AI workloads will be cheaper or faster on specialized silicon rather than only GPUs. That’s important for teams planning cost and latency tradeoffs for production AI services. (x.com)
Most people think artificial intelligence runs on one kind of chip: the graphics processing unit, the part built to do huge numbers of math operations in parallel. Intel and Google are pushing a different idea in April 2026: some serving jobs work better when the central processing unit does more of the work. (intel.com) A central processing unit is the general manager chip in a server. It handles the operating system, moves data between memory and storage, and runs the parts of an artificial intelligence service that are irregular and branchy instead of one giant block of repeated math. (cloud.google.com) A graphics processing unit is closer to a warehouse full of identical workers doing the same motion at once. That is why it dominates model training, where the same matrix math gets repeated across enormous batches of data. (blog.google) The surprise is that production artificial intelligence is not one job. A live service has to receive a request, fetch context, route it to the right model, generate tokens, apply rules, and send a response back within a latency budget measured in milliseconds. (intel.com) That is where Intel and Google say the central processing unit still matters. Intel said on April 9, 2026 that Google will keep using Intel Xeon processors across artificial intelligence, inference, and general-purpose cloud workloads, while the two companies also expand co-development on custom infrastructure processing units. (intel.com) An infrastructure processing unit is a traffic-control chip for the data center. Google uses its Titanium system to offload networking, storage, and virtualization work so the main server chips spend more time on customer workloads instead of housekeeping. (cloud.google.com) Google has already been putting newer Intel chips into its cloud fleet. Its C4 virtual machines, now generally available, run on Intel’s 6th generation Xeon chips called Granite Rapids and Google said they deliver up to 60% better performance for machine-learning recommendation workloads than the prior generation. (cloud.google.com) Intel is also arguing that newer central processing units can handle more artificial intelligence directly than people assume. In its Google Cloud tuning work, Intel has highlighted native half-precision support, 12 memory channels, and clock speeds up to 4.2 gigahertz on Xeon 6, which are the kinds of features that help smaller models and token generation stay busy without a graphics processing unit attached. (community.intel.com) Google is not backing away from accelerators while doing this. Its Ironwood tensor processing unit, announced for Google Cloud in late 2025, is built for high-volume, low-latency inference and Google said it offers more than 4 times better performance per chip than the prior generation for training and inference. (blog.google) So the real shift is not “central processing unit instead of graphics processing unit.” It is a more mixed stack: accelerator chips for the densest math, infrastructure chips for moving data, and central processing units for orchestration, memory-heavy steps, and some inference jobs where cost per request matters more than peak benchmark speed. (intel.com) That mix becomes more attractive as companies move from demos to steady traffic. Intel said this week that “agentic” systems are exposing the limits of graphics-processing-unit-only designs, and its separate April 8 deal with SambaNova paired graphics processors for prefill, custom accelerators for decode, and Xeon 6 as the host and action central processing unit. (intel.com) For anyone running a real artificial intelligence product, the bet here is simple: the cheapest fast answer may come from using the right chip for each step, not the flashiest chip for every step. Intel and Google are spending engineering time on that idea now, which usually means large cloud customers have already started asking for it. (intel.com)