Chips rally, software counts

- Chip stocks posted a long winning streak as investors priced sustained AI-driven demand for semiconductors. - Bloomberg noted a semiconductor index hit its longest streak of daily gains amid AI optimism. - That market view makes software-side efficiency—quantization, caching, and routing—an increasingly valuable engineering differentiator (bloomberg.com).

Semiconductor stocks just logged a record winning streak, with the Philadelphia Semiconductor Index rising for a 16th straight session on April 22 as investors kept betting that artificial-intelligence spending will keep climbing. (bloomberg.com) The index rose as much as 1.9% on Wednesday and finished up 2.72% at 9,909.27, according to Nasdaq’s SOX index data. Bloomberg reported the 16-day run was the longest such streak in records going back to 1994. (nasdaqomx.com) (bloomberg.com) Over that stretch, the chip gauge climbed about 37%, putting April on pace for its biggest monthly gain since February 2000, Bloomberg reported. The move extended a market pattern in which Nvidia, Broadcom, Advanced Micro Devices and Micron have become shorthand for AI demand. (bloomberg.com) (moneycontrol.com) A chip rally says investors expect more spending on hardware, but an AI system does not run on chips alone. It also depends on software that cuts how much memory, time and electricity each query consumes. (developer.nvidia.com) (nvidia.github.io) Quantization is one of those software levers: it stores numbers in fewer bits, like shrinking a file so more of it fits in the same space. Nvidia said its NVFP4 key-value cache quantization cuts cache memory use by 50% versus FP8 and can double context length or batch size on Blackwell graphics processors with less than 1% accuracy loss on the benchmarks it cited. (developer.nvidia.com) Caching is another lever: save work once, reuse it later. OpenAI says prompt caching works automatically on recent models for repeated prompt prefixes, and Nvidia says key-value cache reuse can lower first-token latency by reusing prior computation for requests that begin with the same prompt. (developers.openai.com) (nvidia.github.io) Routing decides where a request goes after it reaches the system. Google Cloud said its Google Kubernetes Engine Inference Gateway routes traffic using AI-specific signals including pending prompt requests and key-value cache utilization, while Google researchers have separately described “mixture of experts” routing as a way to send each token to only part of a model instead of the whole network. (cloud.google.com) (research.google) Those techniques do not replace demand for chips; they change how far each chip can go. If hardware investors are pricing years of AI build-out, the next contest is likely to include the companies whose software makes the same graphics processor serve more users, longer prompts and lower-cost responses. (bloomberg.com) (developer.nvidia.com)

Chips rally, software counts

Get your own daily briefing