Per-file OOM badness

By Jonathan Corbet
June 2, 2022

The kernel tries hard to keep memory available for its present and future needs. Should that effort fail, though, the tool of last resort is the dreaded out-of-memory (OOM) killer, which is tasked with killing processes on the system to free their memory and alleviate the problem. The results of invoking the OOM killer are never going to be good, but they can be distinctly worse if the wrong processes are chosen for an untimely end. As one might expect, the effort to properly choose the right processes is an ongoing effort. Most recently, Christian König has proposed a new mechanism to address a blind spot in the OOM killer's deliberations.

When the system runs out of memory, the OOM killer's job is to try to resolve the problem while causing the least possible amount of collateral damage; a number of heuristics have been applied to the victim-choosing logic toward that end. One obvious rule is that it is generally better to kill fewer processes than many, and the way to do that is to select the processes that are currently consuming the most memory. Often, a single out-of-control process is responsible for the problem in the first place; if that process can be identified and killed, the system can get back into a more stable condition.

The OOM killer, thus, scans through the set of running processes to find the most interesting target. At the core of this calculation is a function called oom_badness(), which sums up the amount of memory (and swap space) being used by a candidate process. That sum is then adjusted by the process's oom_score_adj value (which is a knob that an administrator can tweak to direct the OOM-killer's attention toward or away from specific processes) before being returned as the process's score. The process with the highest score as determined by this function will be the first on the chopping block. Any process's score can be seen at any time by reading its oom_score value from its /proc entry.

One problem with this algorithm, as identified by König, is that oom_badness() does not take into account all of the memory used by a process. Specifically, memory associated with files is not counted; consider, for example, any extra memory that a device driver must allocate when a device special file is opened and operated upon. For some workloads, this memory can be significant, with the result that the processes accounting for the most memory use might not look like attractive OOM-killer targets.

As a simple example, he said in the patch-series cover letter, a malicious process can call memfd_create(), then just write indefinitely to the resulting memfd; the memory consumed by the memfd will not be seen as belonging to the offending process so, when the memfd ends up consuming all of the available memory, the OOM killer will pass over that process. This sequence "can bring down any standard desktop system within seconds". Another problem area, he said, is graphics applications that allocate significant amounts of memory within the kernel for graphical resources.

The solution is to give the OOM killer visibility into the memory resources that are consumed in this way. That, in turn, involves adding yet another member to the ever-growing file_operations structure:

    long (*oom_badness)(struct file *file);

Documentation is lacking, but the intent seems to be that this function, if it exists, should return the amount of extra memory attached to the given file, in pages. This function will be called from within the global oom_badness() function to take that extra memory usage into account; if the file involved is shared between processes, the memory usage will be divided equally among those processes.

Implementations of the new function have been added to the shared-memory filesystem code, the DMA-buf subsystem, and to most graphics drivers. With this mechanism in place, the system has a better idea of where the OOM killer's wrath should be directed to maximize the chances of freeing up significant amounts of memory and bringing the system back to a stable state.

Of course, the hazards of any new heuristic can be seen in this claim in the cover letter: accounting for this memory, König says, provides "a quite a bit better system stability in OOM situations, especially while running games". Accounting for memory used by graphics drivers is likely to point the finger at graphics-intensive applications — games, for example — as the source of an out-of-memory problem. Having the OOM killer take its vengeance on that game may restore the system, but the user, whose nearly complete quest would be abruptly terminated thereby, might be forgiven for thinking that the situation was better before.

In other words, there is still no truly good solution to the OOM problem other than not getting into that situation in the first place. After all, the OOM killer is still, as Andries Brouwer suggested in 2004, like choosing passengers to toss out of a crashing aircraft. When the system runs out of memory anyway, though, it is important to free memory quickly, and that is most likely to happen if the OOM killer has an accurate picture of which processes are using the most memory. Properly accounting for memory attached to files seems like a useful step in that direction.

Index entries for this article
Kernel	Memory management/Out-of-memory handling

to post comments

Per-file OOM badness

Posted Jun 2, 2022 18:40 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

> As a simple example, he said in the patch-series cover letter, a malicious process can call memfd_create(), then just write indefinitely to the resulting memfd; the memory consumed by the memfd will not be seen as belonging to the offending process so, when the memfd ends up consuming all of the available memory, the OOM killer will pass over that process. This sequence "can bring down any standard desktop system within seconds". Another problem area, he said, is graphics applications that allocate significant amounts of memory within the kernel for graphical resources.

That does not sound like a fixable problem in the general case. Can't the malicious process just create files in /dev/shm (either directly, or via shm_open(3)) instead? I find it hard to believe that the kernel can keep track of who created those files, and OOM killing the process won't even clean them up anyway.

Per-file OOM badness

Posted Jun 2, 2022 20:00 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> Can't the malicious process just create files in /dev/shm (either directly, or via shm_open(3)) instead?

Files created in /dev/shm/ reside in a named tmpfs filesystem, which sets an upper bound on the memory consumed (50% of RAM by default—Debian doesn't appear to override this). The files created with memfd_create() are also tmpfs files, from an internal, unmounted tmpfs, but they behave more like anonymous shared memory mappings (mmap() with MAP_ANONYMOUS)—so far as I can tell the limits on /dev/shm/ do not apply. There is a limit on the total amount of shared memory, the kernel.shmall sysctl knob, which I think would also affect memfd_create(), but this defaults to "unlimited" (~2**64 pages).

Per-file OOM badness

Posted Jun 2, 2022 19:29 UTC (Thu) by developer122 (guest, #152928) [Link] (2 responses)

It's unfortunate that it's impossible to say "this application will only ever need X memory and no more" in the general case. Then you could just assign memory to tasks, and stop adding tasks when you run out of memory. Essentially, no dynamic allocation.

In reality, the amount of memory is tied to subtleties in the input. If you ask them to come up with a number, programmers usually just guess the amount and then the application occasionally runs out of memory and dies anyway. And it also dies in cases when borrowing some memory from the rest of the system would have saved it. Ask someone about setting container memory quotas sometime.

Maybe it's possible to have a compiler say "worst_case_mem_used=f(num_inputs1, num_inputs2)" and assign input constraints, but I doubt that will work in general.

Per-file OOM badness

Posted Jun 2, 2022 20:36 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

It depends on how the application is architected, and to some extent, on what it is designed to do. It is often possible to apply one or more of these strategies:

1. The inputs are normally small (and should always be small). Place an upper bound on how large they can be.
2. The inputs are normally large (or at least, they can be large). Divide the input into constant-size or bounded-above-size pieces and process each piece independently (in separate processes or on separate machines). These separate tasks may need to coordinate with one another, so this is often a more complex design, but it's also more scalable.
3. The application is a cache or denorm layer. Evict data in an LRU pattern, or whatever other pattern your testing shows is optimal.
4. The application is a storage layer (i.e. an RDBMS or something like that). Generally speaking, a properly-designed storage layer should be able to persist or retrieve (as a stream) a larger amount of data than what fits in memory, but certain operations (e.g. sorting) may require moving data back and forth between storage and memory repeatedly, so make indices and benchmark your queries appropriately.
5. The problem is not the size of each input, but the sheer number of inputs (e.g. incoming connections). Place a limit on how large your event queue is allowed to get, and drop or refuse excess inputs. This must be combined with a coherent load-balancing strategy, so that inputs can be redirected to less-overloaded tasks. People often object to this on the grounds that dropping inputs is an unacceptable loss of reliability. In some contexts, that's a valid concern, but if you're just running a regular web service over the internet, you already have much worse unreliability at other layers before the traffic even hits your box. Be realistic about what you can accomplish in a real-world setting.

Finally, load test it and record how much memory is actually consumed in a worst-case overload scenario. Repeat those load tests periodically, or at the very least look at your daily peak traffic and how it correlates with your memory consumption, and try to figure out how much worse it could get. That's not perfect, but you can add a safety margin and call it close enough.

At Google, if you use more memory than you asked for, we just kill the whole container. There are tools to help figure out how much memory you should have asked for in the first place, and the memory limit can be quickly or even automatically scaled up if it becomes necessary, but we have been quite successful at saying "this application will only ever need X memory" for a wide variety of applications.

Per-file OOM badness

Posted Jun 2, 2022 21:19 UTC (Thu) by atnot (guest, #124910) [Link]

This is actually a thing in many safety-critical RTOSes. All resources like processes, tasks, memory, etc. are allocated at compile time and there is no way to create any more. Of course that degree of inflexibility is rarely desirable on general purpose machines.

Per-file OOM badness

Posted Jun 3, 2022 3:41 UTC (Fri) by neilbrown (subscriber, #359) [Link] (29 responses)

> In other words, there is still no truly good solution to the OOM problem

Sure there is. It involves applications being written so as to checkpoint important data regularly. i.e. they should be designed to crash.
https://lwn.net/Articles/191059/

Seems to work OK on Android - though some apps are certainly better than others.
My web-browser comes up much where it was before a crash, when that happens.
I don't play "games" (my quest is for better code :-), but I have painful memories of my children on their GameBoy saying "I can't save just now", and how much that annoyed me. You should *always* be able to save, and you should *never* have to because it should be automatic.

Per-file OOM badness

Posted Jun 3, 2022 5:23 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (5 responses)

Not all games are necessarily amenable to perfect autosaving, although I agree that it should be implemented where feasible. But I can think of quite a few games where it would either detract from the gameplay, or at the very least would provide minimal or no improvement over the status quo:

* Undertale (the save mechanic exists in-universe, and removing it would completely wreck the backstory).
* Outer Wilds (no traditional progression mechanics such as leveling up or collecting items; you progress by learning more about the world and its history, so there's nothing to save other than a log of your discoveries).
* Most visual novels (usually, they automatically track which bits you've seen and let you skip past those; saving is at best a convenience function in most cases).
* Most turn-based RPGs (saving in-battle would be a Bad Idea because battles typically have a lot of highly complicated state that the player needs to be reminded of when they resume the game - better to just restart the battle from scratch since it's probably only a few minutes long anyway).
* Most racing games (individual races are typically a few minutes long, it's just not worth it).
* Any game that you can lose, and that doesn't have permadeath (if an autosave happens right before the player loses, the game may become soft-locked, so *some* amount of manual save-loading is required to exist, or else the game has to be very conservative about when and where it autosaves, and either way, the human playing the game has to think about what has or has not been saved at any given time).

Finally, I feel obligated to point out that kids know perfectly well when they can and cannot save. Depending on their age/maturity, they are either deliberately choosing to enter a section of the game where saving is not supported, or they are failing to plan ahead. I don't presume to tell parents how to raise their children, but in either case, you could choose to use this as an object lesson, if you felt it appropriate. For older or more responsible/mature children, it might also help to give them a 10-minute warning, if feasible. But, obviously, it's up to you to decide what works for your kids and your situation.

(I suppose some people might argue that none of the above games ought to exist. I shudder to think of what a boring world that would be!)

Per-file OOM badness

Posted Jun 3, 2022 5:46 UTC (Fri) by neilbrown (subscriber, #359) [Link]

> or they are failing to plan ahead

Yep, that's the one

> you could choose to use this as an object lesson

Indeed! And the lesson "the world is deliberately designed to manipulate your behaviour, to your own determent" can be a valuable one. Still annoying though.

Per-file OOM badness

Posted Jun 3, 2022 21:56 UTC (Fri) by bartoc (guest, #124262) [Link]

In fairness you _can_ usually save in outer wilds by finding some way to kill yourself :D

Per-file OOM badness

Posted Jun 8, 2022 14:56 UTC (Wed) by azumanga (subscriber, #90158) [Link] (2 responses)

One solution to (I think) all the autosaving issues you raised, which would still handle OOM / crash / reboots is autosaves which are constantly updated and cannot be copied.

This means you can't use the saves to retry, just pick up where you left off. To me that doesn't feel like it would upset any of the games you discuss, any more than just "leaving them paused overnight" (which I used to do on consoles back in my youth).

Per-file OOM badness

Posted Jun 8, 2022 16:13 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (1 responses)

> autosaves which are constantly updated and cannot be copied

Using some sort of hardware-backed DRM, presumably? Or a mandatory connection to a server? Neither is really acceptable in a single-player game, but I don't see how you could otherwise prevent someone from killing the program to simulate a crash and then making a backup copy of the save—or perhaps the entire game if necessary—which could be restored at will to revert back to that point.

Per-file OOM badness

Posted Jun 8, 2022 16:57 UTC (Wed) by Wol (subscriber, #4433) [Link]

Pause the game, snapshot your VM, and continue. Completely out of the game's hand, and you've just saved a checkpoint.

Cheers,
Wol

Per-file OOM badness

Posted Jun 3, 2022 12:06 UTC (Fri) by tao (subscriber, #17563) [Link]

"You should always be able to save" kind of ignores dedicated multiplayer games, but also this whole new (in my opinion rather annoying) trend that so many games try to squeeze in multiplayer even when the game suffers from it.

Per-file OOM badness

Posted Jun 5, 2022 22:46 UTC (Sun) by marcH (subscriber, #57642) [Link] (21 responses)

A migration to $cloudDocuments at $BIGCORP found a minority of people upset by the "autosave" default. Further investigation showed they were using the "Save" button as an archaic form of version control / undo feature restricted to a single version...

(true story)

Per-file OOM badness

Posted Jun 5, 2022 23:16 UTC (Sun) by neilbrown (subscriber, #359) [Link] (2 responses)

What ..... do you mean that "autosave" actually saved the document just like clicking the "save" button would?
That is totally broken.
Deliberate-save and auto-save are both important, but they are different.

This relates to the comment by NYKevin above:

> if an autosave happens right before the player loses, the game may become soft-locked

That is nonsense. After a crash/power-off you should always be able to choose between what you deliberately saved (if that is ever an option, which sometimes it might not be) and what was autosaved (which must always have happened "recently").

Any other behaviour is a bug.

Per-file OOM badness

Posted Jun 6, 2022 1:47 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

As far as video games are concerned: I was specifically discussing a hypothetical in which manual saves don't exist, and the game "always" saves all progress. In this context, there is simply no such thing as "what you deliberately saved."

Per-file OOM badness

Posted Jun 6, 2022 1:49 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

(To be clear, this is a real type of saving that has actually been implemented for some games, most notably Minecraft. Minecraft gets around it because the player respawns after death, unless permadeath is enabled for that world, and so soft-locking is nearly impossible in practice. But there are many games where a respawning player would not fit the narrative or tone of the game.)

autosave

Posted Jun 9, 2022 18:24 UTC (Thu) by giraffedata (guest, #1954) [Link] (17 responses)

I think a better description than "archaic form of version control / undo" is "simple commit/rollback control."

I rely heavily on that; I'd be really pissed if a new version of a document editor started committing my work, overwriting my last good version, when I'm halfway through an update.

Of course, if I were a brand new user who had not yet developed the discipline of frequent checkpointing (saving), I suppose I would appreciate the system clicking "save" for me periodically, and then I could turn that feature off after I learn there's a better way.

autosave

Posted Jun 10, 2022 7:32 UTC (Fri) by Wol (subscriber, #4433) [Link]

As someone who has (at work) found myself moving to cloud documents, yes "autosave" does mean "update the current version of the doc", but it also means "take a checkpoint", "update the doc with other peoples' changes", and maybe more. It's interesting (and actually very useful), in a meeting, having three or four people update the same document, at the same time ...

You just need to change your mindset, although it's also annoyingly frustrating when you want to make temporary changes, and you can't ...

Cheers,
Wol

autosave

Posted Jun 10, 2022 14:54 UTC (Fri) by marcH (subscriber, #57642) [Link] (15 responses)

Why the need to manually checkpoint at specific times when "undo" checkpoints everything all the time?

This really sounds like a niche use case, are you running a test suite on your Word-like documents?

If you do, Word and others support versionimg now, so you can upgrade to "save version", keep autosave on, not risk losing anything if your computer crashes and collaborate in real time. Welcome to the 21st century.

autosave

Posted Jun 10, 2022 18:48 UTC (Fri) by giraffedata (guest, #1954) [Link] (14 responses)

Undo does not checkpoint at meaningful stages.

This is the scenario: I decide to restructure a table. Five minutes into it, I figure out it's not as easy as I thought and I have just made a mess and want to go back to the original table. I was careful not to save because I knew I might need to go back and it's a bigger risk to screw up my document because of my own miscalculation than to lose five minutes of work if the system crashes. How many times do I click Undo to get back to where I was before I started messing with the table? It could be dozens. But I can easily toss out all the changes I haven't committed yet.

And what if the system crashes? Now I have a corrupted table and Undo history probably won't be there after restart.

I work this way as a matter of course when editing almost everything -- documents, code, spreadsheets, ... It's not a niche for me.

autosave

Posted Jun 10, 2022 21:05 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (8 responses)

Just go back to the venerable 1960's way of emulating version control: copy the table and work on the copy and delete the original if you accept it. Sure, those nerds have Git and fancy things, but they're so hard to use; I prefer juggling these bowling balls instead.

(There may or may not be any sarcasm here.)

autosave

Posted Jun 10, 2022 22:05 UTC (Fri) by pebolle (guest, #35204) [Link] (7 responses)

> (There may or may not be any sarcasm here.)

But there sure was trolling.

autosave

Posted Jun 14, 2022 20:39 UTC (Tue) by jschrod (subscriber, #1646) [Link] (6 responses)

Why? Directly below, marcH recommended that method.

I'm glad that I still use emacs for documents, which has both autosave and a different "real" save action. It's much more flexible.

autosave

Posted Jun 14, 2022 23:59 UTC (Tue) by marcH (subscriber, #57642) [Link] (5 responses)

Of course making copy seems very "primitive" compared to proper version control.

Yet a copy may be faster and more convenient if it's only for a very short term and throw-away safety before doing something quick and a bit "dangerous". Combined with autosave it's also safer than holding back a manual save (no loss on crash or user error) and barely more effort. It's not mutually exclusive with proper version control.

The best tool for the job and manual saves offer very little advantages in very few situations.

autosave

Posted Jun 15, 2022 0:27 UTC (Wed) by jschrod (subscriber, #1646) [Link] (4 responses)

I have to apologize for my comment that may be interpreted as a snide remark. It was not meant as that -- though, in hindsight, it looks like it does.

Of course, making explicit copies (or intermediate explicit copies) is a most valuable tool. I use it often, myself. Involving git in that moment is too much work afterwards, frankly.

My comment was a reaction to the opinion of pebolle that "making explicit intermediate copies" is a troll argument. Which it is not at all -- and I think, we agree on that. It was a spontaneous reaction and not formulated adequately, please accept my apologies.

What irks me in this discussion: There are really experienced folks who thinks that the difference between autosave and an explicit save is not necessary; any change is immediately applied to the original document. (I experience that with Google Docs, and it's horrible.) In our software development activities, we have now version control systems that differentiate between commit and push and allow rebasing in between. Do you remember SVN? A bad commit happened and you cannot change it because it was immediately pushed to the centreal repository. WTF? Today we have better tools.

How can anybody who is used to such a fine and flexible software development workflow, as exemplified by git, think that it's not necessary to have the same control over the change process of one's document content?

autosave

Posted Jun 15, 2022 1:40 UTC (Wed) by giraffedata (guest, #1954) [Link] (2 responses)

Some people here think hardly anyone benefits from explicit save and that hardly anyone wants it. I'm not convinced anyone here knows any more than I do about what everyone wants (and I assume I'm far from alone in highly valuing explicit save), but here is some evidence in the form of expensive engineering that has been done to support explicit save that explicit savers are legion:

Products that autosave usually provide a way to turn it off. And they usually tell you (warn you) that they are autosaving.

And here's a gem I recently discovered in Microsoft Word: My editing session got interrupted. When I restarted, Word said to me, "I automatically saved some of your work since you last saved. Would you like to keep that?" I said Hell no because I didn't know what half-baked change of mine it might have saved, but on other occasions I surely would have said yes.

Making a copy before starting a change and doing explicit version control with something like Git is impractical. I sometimes hit ctl-S (Emacs ctl-X ctl-S) multiple times a minute.

autosave

Posted Jun 15, 2022 8:43 UTC (Wed) by Wol (subscriber, #4433) [Link]

> And here's a gem I recently discovered in Microsoft Word: My editing session got interrupted. When I restarted, Word said to me, "I automatically saved some of your work since you last saved. Would you like to keep that?" I said Hell no because I didn't know what half-baked change of mine it might have saved, but on other occasions I surely would have said yes.

That's been there a while.

The *problem* with auto-save is that I can no longer "quit without saving"! That is functionality I sometimes *desire*, even before I start my edit. Sometimes I want to play before starting work in earnest. I can no longer do that ...

Cheers,
Wol

autosave

Posted Jun 16, 2022 22:29 UTC (Thu) by nix (subscriber, #2304) [Link]

> Making a copy before starting a change and doing explicit version control with something like Git is impractical. I sometimes hit ctl-S (Emacs ctl-X ctl-S) multiple times a minute.

Automate it! (In particular, see 'magit-wip-mode'.)

autosave

Posted Jun 15, 2022 8:21 UTC (Wed) by pebolle (guest, #35204) [Link]

> My comment was a reaction to the opinion of pebolle that "making explicit intermediate copies" is a troll argument.

My beef was the wording of marcH's comment. It seemed chosen to trigger the recipient - whom I know to be polite and focussed on the subject at hand - into replying over the top too.

(For the record: I actually have no opinion on the pros and cons of autosave.)

autosave

Posted Jun 11, 2022 0:04 UTC (Sat) by marcH (subscriber, #57642) [Link] (4 responses)

Maybe not "niche" but for sure it is not the most common activity in a text document. When undo is not enough, simply use version control or its grand daddy: make a copy. Done.

Like it or not the "save" button is disappearing really fast and very few people struggle to switch to the new, more powerful and more flexible ways.

autosave

Posted Jun 11, 2022 12:18 UTC (Sat) by rschroev (subscriber, #4164) [Link] (3 responses)

> Like it or not the "save" button is disappearing really fast and very few people struggle to switch to the new, more powerful and more flexible ways.

The way I see it, the new ways are more convenient for most use cases and may well be the way forward, but the old way with the save button is the one that is more powerful and more flexible (but requires the user to have a bit more discipline).

autosave

Posted Jun 11, 2022 13:30 UTC (Sat) by Wol (subscriber, #4433) [Link]

> the new ways are more convenient for *MOST* use cases

Emphasis added ...

Cheers,
Wol

autosave

Posted Jun 13, 2022 12:43 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

Of course, there's the issue that showed up on a machine I was helping out with where MS Office lost the ability to "Save As" (the directory selection never showed up). This was…OK, but when it lost even "Save", the only way to save work was:

- autosave to OneDrive (could not choose a local directory AFAICT) behind a Microsoft account
- "Send" the document over email then save the attachment from the draft message
- reinstall MS Office (which forced an upgrade)

There are all kinds of complexities involved in this new mechanism and I'm sure all kinds of new bugs will show up because of it too. I love how companies seem to be ensuring future prospects by deleting working code in preference for new, buggy code.

autosave

Posted Jun 13, 2022 13:30 UTC (Mon) by Wol (subscriber, #4433) [Link]

> I love how companies seem to be ensuring future prospects by deleting working code in preference for new, buggy code.

The problem, of course, is when your working code stops working because of a forced upgrade somewhere else.

I'm now an Excel VBA programmer in practice - oh how I wish I could push all that crap into a decent database ...

(And no, I don't fancy the politics that would entail ...)

Cheers,
Wol

Per-file OOM badness

Posted Jun 3, 2022 7:46 UTC (Fri) by daenzer (subscriber, #7050) [Link]

> if the file involved is shared between processes, the memory usage will be divided equally among those processes.

My reading of the patches is that it's divided equally among all file descriptors referencing the file. So e.g. if process A has one file descriptor referencing a file and process B has two file descriptors, process A will be accounted 1/3 of the file's memory, process B 2/3.

Per-file OOM badness

Posted Jun 3, 2022 8:07 UTC (Fri) by Fowl (subscriber, #65667) [Link]

How is this memory accounted for (and limited!) in cgroups?