AI infra sprint

DeepMind's Gemini is reportedly operating in 20,000+ robots — a sign robotics is moving into production at scale. (x.com) Meanwhile, cloud and chip moves are accelerating: Taalas's HC1 chip is claimed at ~17K tokens/sec and '10x' faster than Cerebras, Microsoft is scaling a 2.1GW Texas AI factory, FlashAttention‑4 reached 71% Blackwell utilization, and OpenAI is sunsetting Sora to refocus on reasoning. (x.com) (x.com)

Agile Robots — a Munich-based industrial robotics firm that says it has installed more than 20,000 robotic systems worldwide — signed a strategic research partnership to embed DeepMind’s Gemini Robotics foundation models into its hardware for electronics, automotive, data‑center and logistics use cases. (TechCrunch: ) DeepMind’s Gemini Robotics page describes models tuned for perception, multi‑step reasoning and tool use that can be fine‑tuned from simulated to physical environments, and DeepMind says those models are intended to enable robots “of any shape and size” to carry out new real‑world tasks autonomously. (DeepMind: ) Taalas’s HC1 demonstrator is specified as a TSMC 6nm, 815 mm² chip with ~53 billion transistors and a 2.5 kW server form factor, and the company reports 17,000 tokens/sec running a Llama 3.1 8B workload. (Taalas: ) Independent coverage and technical commentary note the 17k figure applies to a narrow, low‑concurrency run on a small model and that comparisons to wafer‑scale or GPU systems mix different workloads and precisions, warning the headline “10× faster” ratios reflect benchmark choices rather than broad performance parity across models and concurrency. (CTOL Digital: ) Crusoe announced a new 900 MW AI factory campus in Abilene, Texas that will sit adjacent to its existing site and raise total projected capacity at Abilene to approximately 2.1 gigawatts to support Microsoft AI infrastructure, with land work underway and the first new building slated to be energized in mid‑2027. (Crusoe press release: ) The FlashAttention‑4 paper and accompanying research posts report kernel and pipeline changes that reach up to ~1,613 TFLOPs/s on NVIDIA Blackwell B200 hardware — about 71% of theoretical peak on that device — and claim 1.3× speedups over cuDNN 9.13 and 2.7× over Triton for attention kernels. (arXiv: ) OpenAI announced Sora’s discontinuation in a public post and its Help Center lists concrete timelines: Sora’s web and app experiences will be discontinued on April 26, 2026, and the Sora API will be discontinued on September 24, 2026; media coverage also links the shutdown to a stalled Disney licensing arrangement. (OpenAI Help Center: ) (CNBC: )

AI infra sprint

Get your own daily briefing