What happened to disk performance in 2.6.39
An explanation of the problem requires just a bit of background, and, in particular, the definition of a couple of terms. "Readahead" is the process of speculatively reading file data into memory with the idea that an application is likely to want it soon. Reasonable performance when reading a file sequentially depends on proper readahead; that is the only way to ensure that reading and consuming the data can be done in parallel. Without readahead, applications will spend more time than necessary waiting for data to be read from disk.
"Plugging," instead, is the process of stopping I/O request submissions to the low-level device for a period of time. The motivation for plugging is to allow a number of I/O requests to accumulate; that lets the I/O scheduler sort them, merge adjacent requests, and apply any sort of fairness policy that might be in effect. Without plugging, I/O requests would tend to be smaller and more scattered across the device, reducing performance even on solid-state disks.
Now imagine that we have a process about to start reading through a long file, as indicated by your editor's unartistic rendering here:
Once the application starts reading from the beginning of the file, the kernel will set about filling the first readahead window (which is 128KB with larger files) and submit I/O for the second window, so the situation will look something like this:
Once the application reads past 128KB into the file, the data it needs will hopefully be in memory. The readahead machinery starts up again, initiating I/O for the window starting at 256KB; that yields a situation that looks something like this:
This process continues indefinitely with the kernel running to always stay ahead of the application and have the data there by the time that application gets around to reading it.
The 2.6.39 kernel saw some significant changes to how plugging is handled, with the result that the plugging and unplugging of queues is now explicitly managed in the I/O submission code. So, starting with 2.6.39, the readahead code will plug the request queue before it submits a batch of read operations, then unplug the queue at the end. The function that handles basic buffered file I/O (generic_file_aio_read()) also now does its own plugging. And that is where the problems begin.
Imagine a process that is doing large (1MB) reads. As the first large read gets into generic_file_aio_read(), that function will plug the request queue and start working through the file pages already in memory. When it gets to the end of the first readahead window (at 128KB), the readahead code will be invoked as described above. But there's a problem: the queue is still plugged by generic_file_aio_read(), which is still working on that 1MB read request, so the I/O operations submitted by the readahead code are not passed on to the hardware; they just sit in the queue.
So, when the application gets to the end of the second readahead window, we see a situation like this:
At this point, everything comes to a stop. That will cause the queue to be unplugged, allowing the readahead I/O requests to be executed at last, but it is too late. The application will have to wait. That wait is enough to hammer performance, even on solid-state devices.
The fix is to simply remove the top-level
plugging in generic_file_aio_read() so that readahead-originated
requests can get through to the hardware. Developers who have been able to
reproduce the slowdown report that this patch makes the problem go away, so
this issue can be considered solved. Look for this fix to appear in a
stable kernel release sometime soon.
| Index entries for this article | |
|---|---|
| Kernel | Block layer/Plugging |