One example...

Posted Jan 19, 2012 9:35 UTC (Thu) by khim (subscriber, #9252)
In reply to: The future calculus of memory management by alankila
Parent article: The future calculus of memory management

However, I still think that systems can't be overcommitted safely, so I remain skeptical what possible tricks there are that an administrator could pull while not violating service's expectations.

This depends on what services you have available. Google's infrastructure is good example. It runs different kinds of taks on the same machines. Some serve search results and have "no overquota" policy. Some are batch processes, crawlers, exacycle, etc. These can be killed at any time because you can always just start them on another system.

Now, not only batch processes can be run in overcommit mode - Google can even take memory reserved for critical search process! Because if it actually will ask for the memory later you can kill non-critical process with extreme prejudice and give memory to critical process. Not sure if Google actually does it or not, but this is obviously doable.

If you'll think about what real clusters are doing you may be surprised just how much work is done by processes which can actually be killed and restarted. Sadly today such tasks are usually run synchroniously in the context of critical user-facing process thus to use memory efficiently you'll need to do serious refactoring.

to post comments

This is a variant on what once was called "Goal Mode" in resource management

Posted Jan 19, 2012 17:29 UTC (Thu) by davecb (subscriber, #1574) [Link]

Once more, it's "back to the future" time in computer science (;-))

What you describe here is a superset of a problem that we suffered in the days of the mainframe, that of optimizing resource usage against a "success" criteria. One wanted, in those days, to adjust dispatch priority and disk storage to benefit a program that was overloaded, to get it out of trouble.

Resource management initially allowed one to set guaranteed minimums, and to share when one wasn't using all your allocation, but rather statically. IBM then introduced a scheme that allowed it to be done in a way that often diagnosed a slowdown and added more resources, called "goal mode". It still exists on mainframes.

Modern resource management schemes don't go quite that far. We guarantee minimums, provide maximums so as to avoid letting us shoot ourselves in the foot, make sharing of unused resources easy, and selective penalize memory hogs by making them page against themselves.

We need a modern goal mode: as Linux is a hot-bed of resource management research, I wouldn't be unduly surprised to see it happen here.

--dave