The future calculus of memory management
Over the last fifty years, thousands of very bright system software engineers, including many of you reading this today, have invested large parts of their careers trying to solve a single problem: How to divide up a fixed amount of physical RAM to maximize a machine's performance across a wide variety of workloads. We can call this "the MM problem."
Because RAM has become incredibly large and inexpensive, because the ratios and topologies of CPU speeds, disk access times, and memory controllers have grown ever more complex, and because the workloads have changed dramatically and have become ever more diverse, this single MM problem has continued to offer fresh challenges and excite the imagination of kernel MM developers. But at the same time, the measure of success in solving the problem has become increasingly difficult to define.
So, although this problem has never been considered "solved", it is about to become much more complex, because those same industry changes have also brought new business computing models. Gone are the days when optimizing a single machine and a single workload was a winning outcome. Instead, dozens, hundreds, thousands, perhaps millions of machines run an even larger number of workloads. The "winners" in the future industry are those that figure out how to get the most work done at the lowest cost in this ever-growing environment. And that means resource optimization. No matter how inexpensive a resource is, a million times that small expense is a large expense. Anything that can be done to reduce that large expense, without a corresponding reduction in throughput, results in greater profit for the winners.
Some call this (disdainfully or otherwise) "cloud computing", but no matter what you call it, the trend is impossible to ignore. Assuming it is both possible and prudent to consolidate workloads, it is increasingly possible to execute those workloads more cost effectively in certain data center environments where the time-varying demands of the work can be statistically load-balanced to reduce the maximum number of resources required. A decade ago, studies showed that, on average, only 10% of the CPU in an average pizza box server was being utilized... wouldn't it be nice, they said, if we could consolidate and buy 10x fewer servers? This would not only save money on servers, but would save a lot on power, cooling, and space too. While many organizations had some success in consolidating some workloads "manually", many other workloads broke or became organizationally unmanageable when they were combined onto the same system and/or OS. As a result, scale-out has continued and different virtualization and partitioning technologies have rapidly grown in popularity to optimize CPU resources.
But let's get back to "MM", memory management. The management of RAM has not changed much to track this trend toward optimizing resources. Since "RAM is cheap", the common response to performance problems is "buy more RAM". Sadly, in this evolving world where workloads may run on different machines at different times, this classic response results in harried IT organizations all buying more RAM on most or all of the machines in a data center. A further result is that the ratio of total RAM in a data center vs. the sum of the "working set" of the workloads, is often at least 2x and sometimes as much as 10x. This means that somewhere between half and 90% of the RAM in an average data center is wasted, which is decidedly not cheap. So the question is begged: Is it possible to apply similar resource optimization techniques to RAM?
A thought experiment
Bear with me and open your imagination for the following thought experiment:
Let's assume that the next generation processors have two new instructions: PageOffline and PageOnline. When PageOffline is executed (with a physical address as a parameter), that (4K) page of memory is marked by the hardware as inaccessible and any attempts to load/store from/to that location result in an exception until a corresponding PageOnline is executed. And through some performance registers, it is possible to measure which pages are in the offline state and which are not.
Let's further assume that John and Joe are kernel MM developers and their employer "GreenCloud" is "green" and enlightened. The employer offers the following bargain to John and Joe and the thousands of other software engineers working at GreenCloud: "RAM is cheap but not free. We'd like to encourage you to use only the RAM necessary to do your job. So, for every page, on "average" over the course of the year, that you have offline, we will add one-hundredth of one cent to your end-of-year bonus. Of course, if you turn off too much RAM, you will be less efficient at getting your job done, which will reflect negatively on your year-end bonus. So it is up to you to find the right balance."
John and Joe quickly do some arithmetic: so, since my machine has 8GB RAM, if I on average keep 4GB offline, I will be $100 richer. They quickly start scheming ideas on how to dynamically measure their working set and optimize page offlining.
But the employer goes on: "And for any individual page that you have offline for the entire year, we will double that to two-hundredths of a cent. But once you've chosen the "permanent offline" option on a page, you are stuck with that decision until the next calendar year."
John, anticipating the extra $200, decides immediately to try to shut off 4GB for the whole year. Sure, there will be some workload peaks where his machine will get into a swapstorm and he won't get any work done at all, but that will happen rarely and he can pretend he is on a coffee break when it happens. Maybe the boss won't notice.
Joe starts crafting a grander vision; he realizes that, if he can come up with a way to efficiently allow others' machines that are short on RAM capacity to utilize the RAM capacity on his machine, then the "spread" between temporary offlining and permanent offlining could create a nice RAM market that he could exploit. He could ensure that he always has enough RAM to get his job done, but dynamically "sell" excess RAM capacity to those, like John, who have underestimated their RAM needs ... at say fifteen thousandths of a cent per page-year. If he can implement this "RAM capacity sharing capability" into the kernel MM subsystem, he may be able to turn his machine into a "RAM server" and make a tidy profit. If he can do this ...
Analysis
In the GreenCloud story, we have: (1) a mechanism for offlining and onlining RAM one page at a time; (2) an incentive for using less RAM than is physically available; and, (3) a market for load- balancing RAM capacity dynamically. If Joe successfully figures out a way to make his excess RAM capacity available to others and get it back when he needs it for his own work, we may have solved (at least in theory) the resource optimization problem for RAM for the cloud.
While the specifics of the GreenCloud story may not be realistic or accurate, there do exist some of the same factors in the real world. In a virtual environment, "ballooning" allows individual pages to be onlined/offlined in one VM and made available to other VMs; in a bare-metal environment, the RAMster project provides a similar capability. So, though primitive and not available in all environments, we do have a mechanism. By substantially reducing the total amount of RAM across a huge number of machines in a data center, both capital outlay and power/cooling would be reduced, improving resource efficiency and thus potential profit. So we have an incentive and the foundation for a market.
Interestingly, the missing piece, and where this article started, is that most OS MM developers are laser-focused on the existing problem from the classic single machine world which is, you recall: how to divide up a fixed amount of physical RAM to maximize a single machine's performance across a wide variety of workloads.
The future version of this problem is this: how to vary the amount of physical RAM provided by the kernel and divide it up to maximize the performance of a workload. In the past, this was irrelevant: you own the RAM, you paid for it, it's always on, so just use it. But in this different and future world with virtualization, containers, and/or RAMster, it's an intriguing problem. It will ultimately allow us to optimize the utilization of RAM, as a resource, across a data center.
It's also a hard problem, for three reasons: The first, is that we can't predict, but can only estimate, the future RAM demands of any workload. But this is true today, the only difference is whether the result is "buy more RAM" or not. The second, is that we need to understand the instantaneous benefit (performance) of each additional page of RAM (cost); my math is very rusty, but this reminds me of differential calculus, where "dy" is performance and "dx" is RAM size. At every point in time, increasing dx past a certain size will have no corresponding increase in dy. Perhaps this suggests control theory more than calculus but the needed result is a true dynamic representation of "working set" size. Third, there is some cost for moving capacity efficiently; this cost (and impact on performance) must be somehow measured and taken into account as well.
But, in my opinion, this "calculus" is the future of memory management.
I have no answers and only a few ideas, but there's a lot of bright
people who know memory management a lot better than I. My hope is to
stimulate discussion about this very-possible future and how the kernel
MM subsystem should deal with it.
| Index entries for this article | |
|---|---|
| GuestArticles | Magenheimer, Dan |