Nvidia reframes AI economics
Nvidia urged the industry to judge AI data centres by 'cost per token'—the expense to produce useful model output—rather than raw FLOPS, shifting the conversation toward unit economics. Coverage highlights comparisons between Hopper and Blackwell architectures and positions cost-per-output as the procurement metric to weigh TCO claims (datacenterknowledge.com; wccftech.com).
A token is a small chunk of text, like a few characters or part of a word, and Nvidia says AI data centers should now be judged by the cost to produce those chunks. The company made that case in an April 15 blog post that argues “cost per token” matters more than raw compute metrics such as floating point operations per second, or FLOPS. (nvidia.com) Nvidia says older measures track inputs, not output. Its framing splits the math into compute cost, FLOPS per dollar, and cost per token, then argues only the last one captures the all-in cost of generating usable model output during inference, the stage when a model answers a prompt. (nvidia.com) The company’s pitch is tied to a shift in workload. Data Center Knowledge reported on April 16 that Nvidia now describes AI facilities as “token factories,” because inference, not training, is becoming the primary job in many deployments. (datacenterknowledge.com) Nvidia’s comparison between Hopper and Blackwell is the center of the sales argument. Data Center Knowledge said Nvidia claims Blackwell systems cost about twice as much per compute hour as Hopper, but deliver up to 65 times more tokens per second per graphics processing unit, about 50 times more tokens per megawatt, and roughly 35 times lower cost per million tokens on DeepSeek-R1. (datacenterknowledge.com) The denominator in that equation is throughput: how many tokens a system actually delivers. Nvidia says buyers who focus only on hourly graphics processing unit rates miss the effects of interconnects, lower-precision math formats such as FP4, decoding tricks, key-value cache management, and overall utilization. (nvidia.com; datacenterknowledge.com) That argument arrives as model providers are serving more reasoning-heavy requests, which generate more tokens and keep chips busy longer. Nvidia said in October 2025 that newer benchmarks such as InferenceMAX were designed to measure total cost of compute across real-world inference scenarios, not just peak speed. (nvidia.com) Nvidia has also been using customer names to show the metric in practice. In a February 12 post, it said Baseten, DeepInfra, Fireworks AI, and Together AI were cutting cost per token by up to 10 times on Blackwell with optimized inference software and open-weight models. (nvidia.com) Outside Nvidia, the broader direction is similar even if the exact claims are disputed. A 2025 paper on AI price-performance trends found the cost of a given level of frontier-model benchmark performance had been falling by about 5 times to 10 times per year, reflecting hardware, algorithmic, and market changes. (arxiv.org) Analysts are not treating Nvidia’s metric as neutral. Data Center Knowledge reported that some see cost per token as more relevant for hyperscale operators running inference at high utilization than for enterprise information-technology buyers with smaller, burstier workloads. (datacenterknowledge.com) The immediate effect is a change in what vendors want procurement teams to ask. Instead of starting with chip specs or rental rates, Nvidia wants buyers to ask how many useful tokens a system can ship for each dollar, watt, and rack they commit. (nvidia.com; datacenterknowledge.com)