AMD chiplets target HFT workloads

- AMD’s low-latency networking papers and tuning guides show its EPYC 9005 “Turin” chips and Solarflare X4 adapters are being pitched directly at electronic trading and other sub-10-microsecond server workloads. - The key feature is Smart Data Cache Injection, which lets a network card place inbound packets and metadata straight into a core’s cache instead of DRAM on Zen 5 EPYC. - AMD is pairing that with NUMA-per-socket tuning such as NPS4 and Onload kernel bypass, extending a long-running push into exchange-adjacent infrastructure. (docs.amd.com)

Low-latency trading systems are a race to move market data from a network card to a CPU core with as little delay as possible. AMD’s current pitch is that EPYC 9005 processors and Solarflare X4 adapters can shorten that path. (docs.amd.com) (amd.com) In a standard server path, an Ethernet card uses direct memory access to drop packets into DRAM, then the CPU has to fetch that data back into cache before software can act on it. AMD’s Smart Data Cache Injection, or SDCI, changes that by steering eligible inbound traffic directly into the cache of the core running the application. (amd.com) (docs.amd.com) AMD says that mechanism is available on recent EPYC server CPUs and that its Solarflare driver stack, libraries and adapters have been updated to use it. The company’s Onload user guide says the adapter can write received packets, events and metadata directly to CPU caches when SDCI is enabled. (docs.amd.com 1) (docs.amd.com 2) The audience is explicit. AMD’s 2025 Solarflare X4 low-latency paper says financial technology workloads need low latency, deterministic jitter and specialized multicast handling, and it frames SDCI as part of that stack. (amd.com) That matters because feed handlers, order gateways and pre-trade checks spend their time on tiny hot data structures that are touched over and over. If the packet payload and metadata arrive in cache instead of DRAM, the CPU can skip one of the slowest steps in the handoff. (docs.amd.com) (amd.com) AMD is not selling this as a CPU feature alone. Its white papers tie SDCI to PCI Express Transaction Processing Hints in the processor, a latency-optimized network interface card, and OpenOnload kernel-bypass software to reduce receive and transmit delay together. (docs.amd.com) (amd.com) The NUMA-per-socket setting known as NPS4 is the placement piece of that story. AMD’s NUMA guidance says NPS4 splits one socket into four NUMA domains, with PCI Express devices local to one quadrant, which can make memory and I/O access more predictable when software is pinned carefully. (docs.amd.com) AMD has been talking to this market for years. Its 2018 low-latency tuning note said financial trading and real-time processing often need consistent response under 10 microseconds, and its newer Solarflare materials say the adapters are built for high-frequency trading, risk analytics and financial data flows. (docs.amd.com) (amd.com) The recent shift is that the cache-injection path is now documented as a feature of Zen 5-era EPYC and recent Turin BIOS builds. Linux support has also been moving forward through Smart Data Cache Injection Allocation Enforcement patches for resource control, which shows the software stack catching up with the hardware. (docs.amd.com) (lkml.org) (phoronix.com) So the story is less that a social post discovered a hidden AMD feature than that AMD has been assembling a full low-latency trading stack in public documents: EPYC 9005 CPUs, NPS4 placement, Solarflare X4 adapters, Onload kernel bypass and SDCI cache injection. (docs.amd.com) (amd.com)

AMD chiplets target HFT workloads

Get your own daily briefing