Linux kernel explains CPU scheduling

- Linux’s own kernel docs now frame CPU scheduling around EEVDF, the newer policy replacing CFS in version 6.6, with tuning implications for latency-sensitive hosts. - The key idea is virtual time: Linux tracks lag, deadlines, affinity, and priority classes so the least-served runnable task gets CPU next. - That matters because bad performance often starts outside CPU math — in NUMA memory placement, blocking I/O, and wake-up paths.

Linux CPU scheduling sounds like one narrow kernel topic. It isn’t. It is the control plane for how work moves through a machine — which thread runs, which thread waits, and how memory and I/O delays turn into latency spikes. The useful change here is that the Linux kernel’s own docs now explain this world through EEVDF, the scheduler Linux started moving to in 6.6, instead of treating CFS as the whole story. (docs.kernel.org) ### What does the scheduler actually decide? The scheduler picks the next runnable thread for a CPU. That sounds simple, but it is balancing fairness, latency, throughput, affinity, and a stack of policy rules at once. Linux exposes this through per-thread scheduling policies and controls like `nice`, `sched_setaffinity`, and real-time classes, because the kernel is scheduling threads, not abstract “apps.” (man7.org) ### Why did Linux move beyond CFS? CFS was built around a clean idea — pretend you have an ideal CPU that runs every task at once, then approximate that with virtual runtime on real hardware. The task that has run the least, after normalization, goes next. That model worked well for years, but current kernel docs say CFS is making room for EEVDF, which keeps the fairness idea while adding a better way to reason about eligibility and deadlines. (kernel.org) ### So what is EEVDF doing differently? EEVDF means Earliest Eligible Virtual Deadline First. Basically, Linux still wants equal CPU share for equal-priority runnable tasks, but it now tracks virtual lag and virtual deadlines so wakeups and latency-sensitive work can be handled more precisely. Think of it less like a queue and more like a calendar — not just who is owed time, but who is eligible to be served now without breaking fairness. (docs.kernel.org) ### Why doesn’t “high CPU” explain bad latency? Because a thread can miss its target long before it burns CPU. It can wake on the wrong core, fault in memory from a remote NUMA node, block on storage, or sleep on a lock. The scheduler only sees runnable threads. If the thread is stalled on memory or I/O, your bottleneck is already somewhere else — and the CPU graph can look deceptively calm. (docs.kernel.org) ### Where does memory placement enter the picture? On modern servers, memory is often NUMA — not all RAM is equally close to all CPUs. Linux documents this directly: local memory is cheaper, remote memory is slower, and policy can steer allocations toward a home node or nearby node. That means scheduler decisions and memory decisions are coupled. Pin a workload to one socket but let (docs.kernel.org)making every cache miss more expensive. (docs.kernel.org) ### What about IPC and wake-up paths? A lot of Linux performance is really wake-up performance. Futexes are the classic example — most lock operations stay in userspace, and the kernel only gets involved when a thread actually has to block or wake. Event loops use `epoll` because it scales to large numbers of file descriptors, and `eventfd` gives user space a lightweig(docs.kernel.org)the scheduler is cleaning up after a bad handoff. (man7.org) ### When do real-time policies matter? Only when you really mean it. Linux has normal policies like `SCHED_OTHER`, `SCHED_BATCH`, and `SCHED_IDLE`, then real-time policies like `SCHED_FIFO` and `SCHED_RR`, plus `SCHED_DEADLINE` for explicit runtime and period guarantees. Real-time threads outrank normal ones, which is powerful but dangerous — misuse can starve the rest of the box. (([man7.org)### What should developers and sysadmins take from this? Don’t tune the scheduler in isolation. Start by asking four concrete questions — is the thread runnable, is it on the right CPU, is its memory local, and is it waking efficiently? Linux’s current docs make that clearer than the old “just learn CFS” mental model. The scheduler matters, but the bigger lesson is that latency is usually a chain, not a single knob. (docs.kernel.org) The bottom line is that Linux scheduling is no longer best understood as just CFS fairness math. The modern kernel story is EEVDF plus NUMA plus wake-up mechanics — and that is a much more honest map of where real server performance goes. (docs.kernel.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.