io.net aggregates idle GPUs to cut inference costs

- io.net said on May 15 it aggregates idle GPUs through a decentralized marketplace to offer lower-cost inference capacity than major cloud providers. - A December 2025 io.net-backed study said RTX 4090 clusters cut token costs by up to 75% while reaching 62%-78% of H100 throughput. - io.net’s website and docs list current cloud and model pricing, with model-level rates available through its GET /models API.

io.net is pitching a simple answer to one of AI’s most expensive problems: inference. The company says its network aggregates underused GPUs from data centers, crypto miners and independent suppliers, then routes that capacity through a decentralized physical infrastructure network, or DePIN, to serve AI workloads at lower cost than centralized cloud providers. On May 15, the company’s website said customers can access GPU clusters in more than 130 countries and save up to 70% versus AWS and Google Cloud for on-demand compute. io.net has tied that pitch increasingly to inference, where per-token economics can determine whether an AI feature reaches production. ### How does io.net say the marketplace works? io.net’s March 2 blog post described the network as a compute layer that aggregates underutilized hardware from data centers, crypto miners and independent suppliers. The company said the DePIN model shifts hardware supply away from a single cloud operator and coordinates distributed participants with on-chain payments and verification. (io.net) The company’s main website said that supply is presented to customers as on-demand GPU clusters, with options to mix GPU types, scale up for training and scale down for inference. On May 15, the site listed configurations ranging from single-GPU instances to multi-GPU H100 and A100 clusters, with posted hourly prices for each. ### Why is inference the part io.net keeps emphasizing? A November 2025 io.net explainer said inference is the stage where trained models generate outputs for users and where cost per inference becomes central to unit economics. (io.net) A separate io.net payments guide said current model rates are exposed through its GET /models API using input and output token prices, underscoring that billing is increasingly measured at the token level rather than only by rented hardware. (io.net) io.net’s recent marketing has leaned on that point. A blog post published last week said infrastructure can consume up to 60% of an AI budget, and described startup users whose monthly bills rose sharply under centralized cloud setups before moving workloads. That account is the company’s characterization, but it matches the commercial problem io.net is trying to solve: making inference cheap enough for smaller teams to keep products live. (io.net) ### What evidence has io.net offered for lower inference costs? A Dec. 5, 2025 io.net post cited a peer-reviewed paper, accepted at AIBC 2025, that benchmarked heterogeneous GPU clusters on io.net’s infrastructure. The company said RTX 4090 clusters achieved 62% to 78% of H100 throughput at roughly half the operational cost, and that token costs for batch and latency-tolerant workloads fell by as much as 75%. (io.net) The same post said a four-by-RTX 4090 setup delivered $0.111 to $0.149 per million tokens and represented the best cost-performance ratio in the study. Aline Almeida, head of research at IOG Foundation and the paper’s lead author, said hybrid routing across enterprise and consumer GPUs offered “a pragmatic balance between performance, cost and sustainability.” (io.net) ### Who is the company aiming this at? KayOS, a New York-based startup featured in an io.net case study published on Dec. 3, 2025, said it cut compute costs by about 60% to roughly $1,000 per month per customer after moving to io.intelligence from a multi-provider setup. David Weinstein, KayOS’s chief executive, said the prior cost structure would not have been economically viable for the two-person company. (io.net) io.net’s own website broadens that target beyond startups. The company says enterprise teams can use the same network for training, fine-tuning and inference without long contracts or approval processes, while its March 5 market overview said buyers are increasingly focused on cost per token at P99 latency and avoiding lock-in. ### How much of this is company marketing, and what can be checked? Most of the claims around spare capacity, cost savings and enterprise demand in this story come from io.net’s own website, blog posts and customer case studies. (io.net) The most concrete figures that can be checked directly on May 15 are the company’s posted infrastructure prices, its claim of availability across more than 130 countries, and its documentation showing that current model pricing is exposed through an API. (io.net) The next public checkpoints are also on io.net’s own properties. The company’s website continues to update listed GPU instance prices, and its developer documentation says the latest model-level inference pricing can be retrieved through the GET /models endpoint. (io.net)

io.net aggregates idle GPUs to cut inference costs

Get your own daily briefing