Onload challenges FPGA spend

Published April 23, 2026 by The Daily Scout

- Social posts highlight Solarflare Onload kernel-bypass software achieving microsecond-class performance suitable for trading strategies. - Onload is reported to deliver about 750–800 nanoseconds of software latency, close to the sub-microsecond range needed by some HFTs. - Engineers are re-evaluating expensive FPGA cards for 1–5 microsecond strategies because kernel-bypass software may offer a cheaper latency path (x.com).

Why it matters

Moving packets through the Linux kernel is like sending every trade through airport security. AMD Solarflare’s Onload software skips that detour and, in AMD’s latest guide, posts median one-way latency of 768 nanoseconds for a 4-byte TCP message at 10 gigabits per second. (docs.amd.com) That number sits in the same 1-to-5 microsecond band where many electronic trading strategies compete, and it is far closer to custom hardware than ordinary kernel networking. AMD’s January 22, 2026 Onload guide lists 590 nanoseconds for ef_vi UDP, 768 nanoseconds for Onload TCP, and 967 nanoseconds for TCPDirect in the same 4-byte, 10 gigabit test. (docs.amd.com) Onload gets there by moving the network stack into user space, next to the application, instead of bouncing each packet through the operating system kernel. AMD says the software links to the standard BSD sockets interface, so existing Linux applications can use it without code changes. (docs.amd.com) That is the part prompting engineers to revisit hardware budgets. If a software stack on an AMD Solarflare card can stay under a microsecond for tiny messages, some firms may not need an FPGA card for strategies that can tolerate a few microseconds of end-to-end delay. (docs.amd.com) The comparison is not apples to apples. AMD markets its Alveo UL3524 trading FPGA at less than 3 nanoseconds of transceiver latency and says the card is built for custom algorithms, pre-trade risk checks, and market-data delivery in hardware. (amd.com) But FPGA speed comes with a different cost profile. The UL3524 requires hardware design flows in Vivado, special licensing, and custom logic work, while OpenOnload is distributed as source code and can accelerate TCP and UDP applications that already use Linux sockets. (amd.com) (github.com) AMD’s own documentation frames the tradeoff in operational terms, not just raw speed. The company says Onload is designed to cut kernel overhead, reduce jitter, and preserve compatibility with existing applications, while the FPGA line is aimed at firms that want deterministic logic directly in hardware. (docs.amd.com) (amd.com) The latest Onload release also shows the software is still being maintained for newer Linux environments. AMD’s UG1586 guide was updated on January 22, 2026, and the public Xilinx-CNS Onload repository shows commits as recently as April 2026. (docs.amd.com) (github.com) That leaves firms with a narrower question than “software or hardware.” For strategies living a few microseconds from the market, the current benchmark is forcing a fresh calculation of whether sub-microsecond software is fast enough — and whether FPGA spend still buys enough extra time to matter. (docs.amd.com) (amd.com)

Key numbers

Onload is reported to deliver about 750–800 nanoseconds of software latency, close to the sub-microsecond range needed by some HFTs.
Engineers are re-evaluating expensive FPGA cards for 1–5 microsecond strategies because kernel-bypass software may offer a cheaper latency path (x.com).
AMD Solarflare’s Onload software skips that detour and, in AMD’s latest guide, posts median one-way latency of 768 nanoseconds for a 4-byte TCP message at 10 gigabits per second.
(docs.amd.com) That number sits in the same 1-to-5 microsecond band where many electronic trading strategies compete, and it is far closer to custom hardware than ordinary kernel networking.

What happens next

(docs.amd.com) Onload gets there by moving the network stack into user space, next to the application, instead of bouncing each packet through the operating system kernel.
If a software stack on an AMD Solarflare card can stay under a microsecond for tiny messages, some firms may not need an FPGA card for strategies that can tolerate a few microseconds of end-to-end delay.
Engineers are re-evaluating expensive FPGA cards for 1–5 microsecond strategies because kernel-bypass software may offer a cheaper latency path (x.com).

Sources

Quick answers

What happened in Onload challenges FPGA spend?

Social posts highlight Solarflare Onload kernel-bypass software achieving microsecond-class performance suitable for trading strategies. Onload is reported to deliver about 750–800 nanoseconds of software latency, close to the sub-microsecond range needed by some HFTs. Engineers are re-evaluating expensive FPGA cards for 1–5 microsecond strategies because kernel-bypass software may offer a cheaper latency path (x.com).

Why does Onload challenges FPGA spend matter?

Moving packets through the Linux kernel is like sending every trade through airport security. AMD Solarflare’s Onload software skips that detour and, in AMD’s latest guide, posts median one-way latency of 768 nanoseconds for a 4-byte TCP message at 10 gigabits per second. (docs.amd.com) That number sits in the same 1-to-5 microsecond band where many electronic trading strategies compete, and it is far closer to custom hardware than ordinary kernel networking. AMD’s January 22, 2026 Onload guide lists 590 nanoseconds for ef_vi UDP, 768 nanoseconds for Onload TCP, and 967 nanoseconds for TCPDirect in the same 4-byte, 10 gigabit test. (docs.amd.com) Onload gets there by moving the network stack into user space, next to the application, instead of bouncing each packet through the operating system kernel. AMD says the software links to the standard BSD sockets interface, so existing Linux applications can use it without code changes. (docs.amd.com) That is the part prompting engineers to revisit hardware budgets. If a software stack on an AMD Solarflare card can stay under a microsecond for tiny messages, some firms may not need an FPGA card for strategies that can tolerate a few microseconds of end-to-end delay. (docs.amd.com) The comparison is not apples to apples. AMD markets its Alveo UL3524 trading FPGA at less than 3 nanoseconds of transceiver latency and says the card is built for custom algorithms, pre-trade risk checks, and market-data delivery in hardware. (amd.com) But FPGA speed comes with a different cost profile. The UL3524 requires hardware design flows in Vivado, special licensing, and custom logic work, while OpenOnload is distributed as source code and can accelerate TCP and UDP applications that already use Linux sockets. (amd.com) (github.com) AMD’s own documentation frames the tradeoff in operational terms, not just raw speed. The company says Onload is designed to cut kernel overhead, reduce jitter, and preserve compatibility with existing applications, while the FPGA line is aimed at firms that want deterministic logic directly in hardware. (docs.amd.com) (amd.com) The latest Onload release also shows the software is still being maintained for newer Linux environments. AMD’s UG1586 guide was updated on January 22, 2026, and the public Xilinx-CNS Onload repository shows commits as recently as April 2026. (docs.amd.com) (github.com) That leaves firms with a narrower question than “software or hardware.” For strategies living a few microseconds from the market, the current benchmark is forcing a fresh calculation of whether sub-microsecond software is fast enough — and whether FPGA spend still buys enough extra time to matter. (docs.amd.com) (amd.com)