Kernel bypass still a selective win

Recent guidance says kernel bypass (user‑space packet processing, RDMA patterns, DPDK) pays off when the kernel network stack is the bottleneck and packet rates are predictable, but not when GC pauses, serialization, or service hops dominate latency. The recommendation is to require a latency‑attribution report showing NIC, kernel, serialization and app contributions before investing in bypass work. (x.com)

Kernel bypass moves packet handling out of the operating system and into the application, and current guidance says it pays only when the kernel is the delay. (docs.kernel.org) The main tools all chase the same goal in different ways. Data Plane Development Kit, or DPDK, gives user-space programs direct packet access, while Address Family Express Data Path, or AF_XDP, uses Linux Express Data Path hooks to steer packets into user-space rings. (dpdk.org; docs.kernel.org) Remote Direct Memory Access, or RDMA, goes further by letting one machine read or write another machine’s memory without the remote operating system or central processor handling each transfer. NVIDIA’s documentation says that cuts context switches and central processor overhead. (docs.nvidia.com) That work is expensive to build and operate, so engineers are being told to prove where the time goes before they rewrite a network path. The proposed test is a latency-attribution report that breaks delay into network card, kernel stack, serialization, and application time. (docs.kernel.org; docs.nvidia.com) The distinction matters because many slow requests are not waiting on the kernel at all. Oracle’s Java documentation says the Garbage-First collector is designed around pause-time goals, a reminder that runtime pauses can dominate tail latency even when packet handling is fast. (docs.oracle.com) The same is true for data formatting and service fan-out. Jeff Dean’s widely cited latency table puts a same-datacenter round trip at about 500 microseconds and sending 1 kilobyte over a 1 gigabit link at about 10 microseconds, so extra hops can swamp savings from shaving a few kernel transitions. (gist.github.com) Kernel bypass still has clear territory. Cloudflare wrote that vanilla Linux handled about 1 million packets per second in its 2015 tests, far below what modern 10 gigabit network cards could process, making bypass attractive for fixed, high-rate packet workloads. (blog.cloudflare.com) Linux’s own documentation now frames AF_XDP more narrowly: it can increase performance in certain use cases, not all of them. The xdp-project tutorial makes the same point from the kernel side, saying Express Data Path itself is an in-kernel fast path and only reaches user-space bypass when packets are redirected into AF_XDP sockets. (docs.ebpf.io; github.com) That leaves a more selective rule for 2026 deployments. If packet rates are steady and the kernel is the bottleneck, bypass can buy lower latency and lower central processor cost; if garbage collection, serialization, or extra service hops dominate, the rewrite mostly moves complexity around. (docs.kernel.org; docs.nvidia.com; docs.oracle.com)

Kernel bypass still a selective win

Get your own daily briefing