Memory overcommit in containerized environments
Xie started by pointing out that, while containers are most often seen as a server-side technology, both server and client applications are often run in containerized environments. Those two types of workloads can vary in their execution, though; server applications tend to run constantly and predictably, while clients can be more bursty as they respond to user interactions. For server applications, the focus tends to be on reliability, while clients aim for responsiveness. Proper management of overcommitted memory is important in both cases.
Providing the memory resources that a containerized application needs — and no more — requires understanding what that application's working set is at any given time. The working set can be seen as a sort of histogram, where pages of memory are placed in bins according to some metric, usually the time of last access or some estimate of coldness. These bins can take the form of generations in the multi-generational LRU or the traditional active/inactive-list mechanism used by the kernel for years. Indeed, sometimes the classification of pages can even be done in user space.
One way of controlling the memory available to a container is a balloon device, which can allocate memory within the container (thus "inflating") and return that memory to the host if a container's memory needs to shrink. The balloon can be deflated to give a container more memory. The work under discussion is aimed at collecting working-set data and providing it to the host by way of the balloon device. The host can then use this information to respond to changes in memory use.
Alumbaugh took over to talk about the notification mechanism. In short,
the balloon driver within the container will be informed (by whatever
mechanism is employed to monitor memory use) when a new
working-set report is available, via a shrinker-like callback interface.
That information can then be passed up to the host, which will use it to
implement its resource-management policies. Actions taken in response to
working-set reports can include setting control-group limits, or changing
the balloon size in specific virtual machines.
Patches implementing this mechanism have been posed to the mailing lists, he said, and an associated QEMU patch set will be posted soon. Google's crosvm virtual-machine monitor has already gained support for working-set reports transmitted in this way, and there have been discussions on adding it to the Virtio specification as well.
The only response to the presentation was a comment from David Hildenbrand,
who expressed his dislike for balloon drivers as a way of controlling
memory resources. Without care, balloon inflation can create out-of-memory
situations, which is rarely the desired result. It is better, he said, to
use free-page reporting to the host, which can respond by telling guests to
adjust their working-set sizes. That provides guests with the flexibility
they need to avoid out-of-memory problems. The core idea is the same as
what had been presented, he said, but the mechanism is different.
| Index entries for this article | |
|---|---|
| Kernel | Memory management/Virtualization |
| Conference | Storage, Filesystem, Memory-Management and BPF Summit/2023 |