Booting from remote storage
In the only storage-only LSFMM 2017 session that LWN was able to attend—it was scheduled opposite the one-and-only filesystem and memory management combined session—Lee Duncan explored some of the questions and problems he sees in booting from remote storage. He said that he wanted to get feedback from the assembled developers to see where solutions might lie.
Ethernet booting works just fine, he said, as long as everything is configured correctly. Some of the hardware makes that difficult, however. For example, there are Broadcom network cards that use the same MAC address on multiple ports and other network adapters that can be used for iSCSI, but not for general networking. When things are misconfigured, systems do not boot and it is hard to diagnose and fix them.
There is a standard for booting over iSCSI from Microsoft that is loosely followed, he said. But there is nothing for Fibre Channel over Ethernet (FCoE) so various hacks are used to make things work.
Error handling is a sore spot as well. Disconnection and reconnection need to be handled by either user space or the kernel, which one is transport-specific. For iSCSI, it is done in user space, but it is sometimes difficult to determine which errors indicate that a reconnection should be tried. In addition, some iSCSI cards fail over to another card or port in some cases, but it is not clear what those cases are.
Multipath I/O is also problematic. Systemd handles the sequencing of setting up the paths; if it gets the sequence wrong, the system will hang on start or stop.
There is a need for standards in this area and for more testing. The vendors do not run their hardware on enterprise kernels, he said, so the distributions should be qualifying this hardware. But Jes Sorensen objected that testing cannot capture all the different scenarios; even if a hardware device is qualified, you could "put it in a data center with slightly different routing and it falls over".
Systemd is part of the problem because it changes so frequently, Sorensen said. Hannes Reinecke said that systemd maintainer Lennart Poettering told him that multipath will never work because "that is not how they designed systemd".
It may make sense to have some standard like the Linux Standard Base (LSB) for what the distribution vendors are willing to support for remote boot, Sorensen said. It would require getting the right distribution representatives together in a room to determine what the officially supported processes for remote booting would be. Representatives from Red Hat, SUSE, Ubuntu, and others would need to come together, but he did not see enough of those right people in the room, he said. James Bottomley said that it is problem that impacts the kernel developers but not one they can solve; distributions need to fix it.
The md (software RAID) unit tests are available for some of this testing, Sorensen said. He is interested in making them test more functionality and for them be used more widely. Right now they are quite md-specific and just running on loopback devices, but they could be extended and options could be added to choose the storage device used.
But Mike Snitzer thought that all of those layers should come into play after booting. The remote booting case should be simplified and tested that way; local boot should be used to test the more complex stacking options. Duncan agreed that the stacking of these different layers was not the main problem; the most fundamental problem is just to get the system to boot from remote storage.
There is a "common set of problems booting from the HBAs [host bus adapters] of the world", Martin Petersen said. Duncan said that the problem often manifests itself after an installation from CD; when the system tries to reboot, it fails and complains about a missing device. Sorensen said that it is often a driver or firmware file missing from the Dracut initramfs image, some of which could perhaps be simulated in virtual machines (VMs). Bottomley said that there is no coherent information to give to the user to help them diagnose the problem; the Dracut image needs to have that so that it can report it.
Creating a virtual machine environment to do this testing is going to take a lot of work, Sorensen said. But canned tests in VMs would be helpful, Duncan said. What it takes to boot versus what it takes to bring up all of the attached storage should be tackled separately, Bottomley said. Otherwise there is a combinatorial explosion of different options to test. It would be useful to be able to create a list of what modules are needed to bring a given block device online, Petersen said.
There was some discussion about having a microconference at the 2017 Linux Plumbers Conference to start the process of defining what will be supported and how it will be tested. There was general agreement that it might be the right venue to advance the process of rationalizing remote boot for Linux.
| Index entries for this article | |
|---|---|
| Conference | Storage, Filesystem, and Memory-Management Summit/2017 |