Axboe boosts kernel I/O 50%

- Jens Axboe surfaced a Linux kernel proof of concept for direct storage I/O that keeps the normal kernel path but strips out a lot of per-request setup. - The claimed gain is roughly 50% to 60% more I/O per CPU core by pre-registering DMA-mapped buffers and reusing bios instead of rebuilding them. - If that approach holds up, it could narrow the appeal gap between standard Linux storage stacks and user-space bypass systems like SPDK.

Storage I/O is one of those places where modern hardware keeps getting faster than the software path feeding it. NVMe drives can complete absurd numbers of operations, but the CPU still burns cycles setting each request up. That setup cost is exactly what Jens Axboe is poking at with a new Linux kernel proof of concept. The idea is simple in spirit — keep the kernel in charge, but stop rebuilding the same plumbing for every direct I/O request. ### What changed? Axboe posted a proof of concept that targets Linux O_DIRECT I/O — the path applications use when they want to bypass the page cache and talk to storage more directly. The claim is big: about 50% to 60% more in-kernel I/O throughput per CPU core on the tested path, without going all the way to a user-space storage stack. ### What is the kernel wasting time on? A lot of direct I/O overhead is bookkeeping. The kernel has to pin or validate user memory, make sure DMA can reach it, build bios — the block I/O structures that describe what to read or write — and push those requests down the stack. None of that is the actual storage operation. It is the cost of preparing the operation. That overhead mattered less when drives were slower. But with fast NVMe, the prep work can become the bottleneck. ### So what is the trick? Basically, Axboe is front-loading the expensive parts. Instead of mapping user buffers for DMA on every request, the application would register buffers ahead of time. Then the kernel can keep those buffers in a ready state for I/O. On top of that, the proof of concept prepares bios so they are closer to ready-to-submit objects rather than freshly assembled work units each time. Think of it like airport security. The normal path is taking your shoes off for every flight. This path is closer to precheck — identity checked once, bags known, less repeated ceremony. ### Why does O_DIRECT matter here? Because O_DIRECT already opts into a stricter, lower-level contract. Applications using it are usually databases, storage engines, and latency-sensitive services that do not want the kernel page cache in the middle. That makes it a good place to squeeze overhead out — the callers already care about predictable, high-throughput I/O more than convenience. ### Is this kernel bypass? No — and that is the interesting part. Systems like SPDK get huge performance by moving storage handling into user space and bypassing large chunks of the kernel. That can work very well, but it also adds integration costs, operational complexity, and a different programming model. Axboe’s pitch is narrower: keep Linux’s normal block stack and safety model, but make the hot path much cheaper. ### Why is that a big deal? Because the choice between “plain Linux” and “full bypass stack” has often been too binary. If the kernel path gets dramatically leaner, more workloads can stay on the standard stack and still hit their targets. That matters for operators who want kernel features, filesystems, observability, and simpler deployment — but hate paying extra CPU per I/O. ### What is the catch? It is still a proof of concept. The hard part is not just showing a benchmark jump. The hard part is making the model safe, general, and maintainable across filesystems, block drivers, memory management, and all the weird edge cases Linux has accumulated over decades. ### Bottom line? This is not “Linux storage just got 60% faster” — not yet. But it is a serious signal from the maintainer who has already reshaped Linux I/O more than once. If the idea survives contact with real workloads and upstream review, the kernel could get a lot closer to bypass-class efficiency without actually leaving the kernel.

Axboe boosts kernel I/O 50%

Get your own daily briefing