ASIC cloud for AI agents
- General Compute announced an ASIC‑first inference cloud built for autonomous AI agents, using purpose‑built accelerators instead of GPUs. - The product aims to optimise inference economics for persistent, high‑volume agent workloads rather than generic GPU instances. - If agentic software becomes persistent at scale, inference cost‑per‑task will be a competitive axis for cloud and model providers (wingerdaily.com)
An application-specific integrated circuit is a chip built for one job, the way a toaster is built for bread instead of every kitchen task. General Compute said on April 18 it is building an inference cloud around those chips for autonomous AI agents, not around graphics processing units. (accessnewswire.com) Inference is the step where a model answers a prompt, picks a tool, or writes code after training is already done. General Compute said its service is aimed at agents that make high volumes of large language model calls and tool calls, with general availability scheduled for May 15, 2026. (accessnewswire.com) The company said its system splits “prefill” and “decode,” two parts of serving a model response, so each can be scaled separately. It also said agents can sign up, create application programming interface keys, and provision inference programmatically without a human in the loop. (accessnewswire.com) General Compute’s website pitches that design as a speed and cost play: “purpose-built ASICs,” “1,000 tokens per second,” and “7x faster inference.” The site also says developers can switch by changing the base URL because the service uses OpenAI-compatible endpoints. (generalcompute.com) The pitch lands at a moment when cloud demand is shifting from training models to serving them continuously. McKinsey wrote in December 2025 that inference is projected to make up a little more than half of AI workloads by 2030, pushing data center design toward low-latency, energy-efficient sites. (mckinsey.com) TrendForce said on January 20, 2026 that global AI server shipments are expected to grow more than 28% year over year in 2026, with ASIC-based systems reaching 27.8% of shipments. The firm said the second half of 2025 brought a shift toward inference services as companies chased revenue from agents, Llama-based apps, and copilots. (trendforce.com) That is the bet behind an “ASIC-first” cloud: if agents run all day, the key metric stops being access to a generic graphics processor and starts being the cost of each completed task. General Compute is trying to sell infrastructure for that narrower, steadier workload instead of the broader market for rented GPU instances. (accessnewswire.com)