OLS: Xen and UML
Xen
A full house turned out to hear Xen hacker Ian Pratt discuss his project. Xen is riding high; the software is cool and getting cooler, the venture money is flowing in, and there is no lack of buzz. Ian's talk, while mostly technical in nature, showed the signs of an up-and-coming business: slick, animated slides, and a good marketing pitch ("virtualization in the enterprise") on why virtualization is a useful thing in the first place. This was worth seeing; it is easy to understand why something like Xen is cool technology, but it can be harder to get a handle on why investors are lining up to throw money at it.
Virtualization is not a particularly new idea. Your editor first experienced it on an IBM mainframe over twenty years ago; we shared files by sending them out our virtual card punch into a co-worker's virtual card reader. Given that the alternative, in that particular time and place, was a real card reader, this looked pretty good. Every now and then things would go weird, and we would have to reboot CMS on our virtual CPU. Not only have things changed little since then, but that was all old stuff even on those days.
In the Linux world, virtualization takes one of three forms. In the "single operating system image mode," as used by the Linux-vserver project (or a simple chroot() setup, for that matter), instances are run within resource containers. Getting strong isolation is hard with this approach. Full virtualization runs an unmodified operating system in a complete virtual machine; systems like VMWare and Qemu work this way. The problem with full virtualization is that it can be hard to do in a way which is both secure and efficient, especially on current x86 hardware. Finally, there is para-virtualization, where the guest operating system kernel is explicitly ported to a virtual machine architecture; both Xen and user-mode Linux are para-virtualized systems.
So why bother with all of this? One is server consolidation: move all of those servers onto fewer actual boxes, with the resulting savings in floor space, power, air conditioning, and hardware maintenance. If you can move virtual machines between physical hosts, you can shift them around to avoid down time; when the disk drive starts to squeal, the administrator can evacuate the virtual systems to working hardware and deal with the problem. Migration also allows workload balancing; it is easier to put more virtual systems on each physical host if they can be shifted around to keep the load on all of those hosts about the same.
One other use for virtualization is security: putting a process within a virtual machine encapsulates it nicely. Even if that process is compromised, there are limits to the damage it can do - as long as it remains trapped within its virtual host. It is also possible to monitor the behavior of the virtual hosts themselves; if one starts doing unusual things, there is a good chance it has been compromised. In this sense, virtualization achieves the same broad goal as SELinux: it puts walls between applications running on the same host. The virtualization approach has the advantage of relative simplicity for situations where all users of a host are to be completely isolated from each other.
Xen, currently, is at version 2.0.6. It provides secure isolation, resource control, quality of service guarantees, live migration of virtual machines, and an execution speed which is "close to native" on the x86 architecture. As a para-virtualization system, Xen requires that the guest kernel be ported to its virtual architecture; ports exist for NetBSD, FreeBSD, Plan9, Solaris, and, of course, Linux. The first virtual machine ("domain 0") is special; it is used for a number of Xen configuration tasks and often provides services to other virtual hosts.
Xen itself runs as a thin layer between the guest and the host operating system. Guests normally run autonomously, as separate processes; they call into the hypervisor for privileged operations. The number of modifications to the guest kernel is relatively small; beyond the privileged calls, the guest must be aware that there is a difference between the time it spends running and how much time passes in the real world. There is also an interface for the guest to find out what resources (memory and such) have been allocated to it, so that it can optimize its behavior accordingly.
There is an interface which allows guest systems to access devices on the host. This interface provides virtualized access to the PCI configuration space, intermediated by the hypervisor; guests can also map device MMIO space into their address spaces. Interrupts are delivered by way of the hypervisor. Virtual systems can perform DMA; this can be a security problem if the host system (like most x86 systems) lacks an I/O memory management unit. For this reason, and others, devices are often handled by the "domain 0" guest and exported to other guests.
The Xen developers are clearly proud of the virtual machine migration feature. The migration code has been carefully written to minimize the impact on the host system and to avoid creating downtime for the guest. When the decision is made to move a virtual system, Xen will start copying the guest's memory over to the new host while the guest continues to run. The guest will thus continue to create dirty pages, and some pages will be changed after they are copied. So an iterative technique is used; each pass copies (hopefully) fewer pages, and gets closer to creating a full, current copy on the new host. The final stage is to stop the guest, copy any remaining memory and other state, then start the guest on the new system. The actual downtime can be far less than one second; Ian showed traces from a move of a Quake server; the server was stopped for some 50ms, and the players never noticed.
A 3.0 release is in the works. The architecture is being reworked somewhat to move much of the platform initialization code into domain 0, making the hypervisor smaller (and easier to audit). Things like PCI and ACPI initialization will move in this way; that work has already been done in Linux, after all. There will be support for access to video devices from guest systems; this is apparently a plot to force the Xen developers to run it on their desktops and fix bugs more quickly. There will be ports to a number of new platforms, including x86-64, ia64, and (a little later) the PowerPC. Support will be added for the x86 architecture running in the PAE mode, allowing Xen to be run on systems with large amounts of memory. Xen will allow the creation of SMP guest operating systems; in fact, it will be possible to add and remove virtual CPUs on the fly. Migration support will be enhanced for tasks like cluster load balancing.
The 3.0 release is going into a stabilization period now. So the developers are already looking toward 3.1. For this release, work is being done to support Intel's VT (and AMD's "Pacifica") architecture, which will enable full virtualization of unmodified guest operating systems. The control tools will be enhanced, and a great deal of performance tuning will be done. Ian notes that it is currently quite easy to configure Xen for bad performance; it would be better if it could configure itself to perform well. 3.1 will have at least some support for NUMA systems, for direct access to InfiniBand devices, and more.
Looking further head, the Xen developers are contemplating whole-system debugging, with an eye toward finding problems in large, distributed applications. "Virtual machine forking" would be useful for the creation of honeypots or quickly sandboxing untrusted software. "Multi-level Xen" as a secure virtualization technique is also on the list.
User-mode Linux
The user-mode Linux project predates Xen, but, seemingly, has been eclipsed by the publicity Xen has received in the last year. Certainly UML is on the Xen radar; Ian Pratt took pains to mention a few places where Xen was able to claim better performance than UML. Jeff Dike's UML talk, instead, looked at where that project was going without a single mention of the competition. UML is alive, well, and currently undergoing significant development.
UML is adding support for the Intel VT mechanism. Jeff figures that the work should apply well to AMD's Pacifica offering, but VT is the main priority now. (That is not entirely surprising, once one realizes that this work is being done by Intel engineers.) The VT extension allows the creation of a complete virtual processor within the hardware. The virtual system is essentially indistinguishable for the "real" host, but certain privileged operations trap back to the host system, rather than being executed in the guest.
User-mode Linux will, when running under VT, run in ring 0, just like a real kernel. Most system calls made by processes running inside the guest will trap directly into the guest kernel; the host will not be involved at all. When the guest itself must make a call to the host system, it forces a trap with the VMCALL instruction. Despite the fact that UML now runs in ring 0, it is still a user-mode process, and thus still deserves its name.
The big benefit to this mode of operation is performance. A number of the things which currently hurt UML, such as the cost of implementing system calls in the guest, just go away. Further work, such as in the adoption of some variant of the dynamic tick patch, should also help improve performance.
Actually making this work requires the incorporation of a simple hypervisor into the host system kernel. The hypervisor will handle getting UML started as a guest system, and will be invoked when the guest makes a system call or springs some other sort of trap. This work is essentially complete (Jeff credited Asit Mallick, Suresh Siddha, Gennady Sharapov, and Mikhail Kharitonov for the actual work). By the time systems with VT are available, UML should be close to being in a position to make full use of them.
A virtual conclusion
Virtualization is clearly a hot topic at the moment; no other subject was
covered by so many talks at OLS. Money is being spent, companies have been
formed, and people clearly expect this stuff to go somewhere. Computers
are clearly valuable, as witnessed by the fact that we have created so many
of them. So it makes sense that people will want to create even more
computers in software. When the hype settles and the technology
stabilizes, we'll probably find that, while virtualization has not changed
the world, it has added a tool which proves to be useful in a number of
situations.
| Index entries for this article | |
|---|---|
| Kernel | User-mode Linux |
| Kernel | Virtualization |
| Kernel | Xen |
| Conference | Linux Symposium/2005 |