Kernel‑bypass vs microkernels
A debate is emerging about whether modern microkernels can beat traditional kernel‑bypass stacks like DPDK for deterministic packet handling. Some practitioners argue that microkernels with optimized IPC, zero‑copy DMA and userspace drivers can outperform Linux/DPDK in NIC saturation tests, which could reshape choices around user‑space networking and NIC offload strategies (x.com).
For most of the last decade, the answer to fast packet handling looked settled. If you wanted line-rate networking with low jitter, you stepped around the kernel. DPDK became the standard move. Its poll-mode drivers run in user space, poll NIC queues directly, and avoid interrupts on the fast path. That is why cloud vendors still document DPDK as the tool for “fast packet processing,” “low latency,” and “consistent performance” on modern VMs (doc.dpdk.org, cloud.google.com). That old story is now getting crowded by a newer one. The challenge is not coming from the classic Linux kernel stack. It is coming from designs that look more like microkernels, where the kernel does very little and the networking pieces live in isolated user-space components. Google described this approach in Snap, a “microkernel approach to host networking” that has been running in production for years. Snap’s modules handle switching, virtualization, traffic shaping, and a reliable messaging service in user space, and Google reported more than a 3x improvement in gigabits per second per core over a kernel networking stack on RPC workloads (dl.acm.org). That does not mean microkernels suddenly beat DPDK at its own game. It means the boundary moved. Snap is “microkernel” in the architectural sense, not in the strict seL4 sense. The point is that people stopped assuming that every extra boundary crossing is fatal. If the components are small, the queues are well designed, and the scheduler cooperates, modularity does not have to cost throughput. In Google’s case, it bought something DPDK systems often struggle with: the ability to evolve networking services quickly without dragging applications around with them (dl.acm.org). The reason this debate is flaring now is that the enabling pieces have improved all at once. Modern fast paths lean on direct DMA into user memory, careful ownership of buffers, and fewer cache-disrupting copies. Research systems like Demikernel are explicit about this shift. They treat the old kernel as too expensive for microsecond-scale I/O and build a datapath OS around zero-copy APIs and DMA-capable memory instead (dl.acm.org). Linux itself has been moving in the same direction. The kernel now documents io_uring zero-copy receive, which lets NICs DMA packet data straight into user-space memory while keeping the kernel in the control plane rather than bouncing every byte through it (docs.kernel.org, lwn.net). That convergence matters because it weakens the old binary choice. It used to be Linux stack or kernel bypass. Now there is a third pattern: keep a minimal kernel, move drivers and protocol machinery into isolated user processes, and use zero-copy paths so the crossings are cheap enough to tolerate. That is also why the loudest claims in this new argument focus on deterministic packet handling, not just raw average throughput. Polling loops can hit huge numbers, but saturation tests are often won or lost on tail behavior, queue ownership, and whether one noisy component can disturb another. A microkernel-style design can make those boundaries explicit instead of accidental (doc.dpdk.org, dl.acm.org). Still, the strongest version of the claim has not been proved in public. There is plenty of evidence that user-space, microkernel-like networking can be fast. There is solid evidence that optimized IPC and zero-copy can erase a lot of traditional microkernel overhead. There is not, at least in the open literature, a clean and broadly accepted set of apples-to-apples results showing a modern microkernel stack consistently beating a tuned Linux-plus-DPDK stack in NIC saturation tests across hardware and workloads. Even the seL4 world, which has pushed user-space drivers and strong isolation furthest, is better documented on assurance and architecture than on public head-to-head packet benchmarks of that kind (sel4.systems, dl.acm.org). What has changed is the burden of proof. DPDK no longer gets to win by default just because it bypasses the kernel. Once the kernel becomes tiny, the drivers live in user space, and the NIC can DMA straight into application-owned buffers, “bypass” stops being a special trick and starts looking like one implementation detail among several. Google’s host network already runs that way on over half its fleet, which is a more concrete fact than any argument on X (dl.acm.org).