Friday, August 8, 2008
Over the last six months, Google has sponsored Gelato@UNSW to take a close look at the disk schedulers in Linux, particularly when combined with RAID.
We benchmarked the four standard Linux disk schedulers using several different tools (see our wiki for full details) and lots of different workloads, on single SCSI and SATA disks, and on hardware and software RAID arrays from two to eight spindles (hardware raid) and up to twenty spindles (software raid), trying RAID levels 0 through 6.
We had to fix some of the benchmarking tools (the fixes are now upstream), and we developed a new one: a Markov Chain based replay tool, which allows a workload to be characterised and then a similar workload generated.
We found bugs in all the schedulers; the ones in the deadline and anticipatory schedulers we fixed, and the current kernel.org kernel has our fixes in it. CFQ's problems are harder to fix; we are continuing to work on them.
The work was presented at the Linux Storage and Filesystem Workshop, and in January at the linux.conf.au 2008 Kernel Mini-Conference. See our Talks wiki page for links to slides and video.
Our major finding is that the best I/O scheduler to use is very dependent on your workload. The deadline scheduler seems to give a good compromise between bandwidth and bounded latency; but for particular workloads on small numbers of disks AS and CFQ can outperform it by a long way. In our measurements on hardware RAID the benefits of anticipation are negligible with more than three or four spindles; and CFQ's worst case performance (which seems to be very easy to trigger) is orders of magnitude worse than that of any other scheduler.
The most interesting results are outlined on our wiki; full results will be published later this year.