Per-file OOM badness

Posted Jun 2, 2022 20:36 UTC (Thu) by NYKevin (subscriber, #129325)
In reply to: Per-file OOM badness by developer122
Parent article: Per-file OOM badness

It depends on how the application is architected, and to some extent, on what it is designed to do. It is often possible to apply one or more of these strategies:

1. The inputs are normally small (and should always be small). Place an upper bound on how large they can be.
2. The inputs are normally large (or at least, they can be large). Divide the input into constant-size or bounded-above-size pieces and process each piece independently (in separate processes or on separate machines). These separate tasks may need to coordinate with one another, so this is often a more complex design, but it's also more scalable.
3. The application is a cache or denorm layer. Evict data in an LRU pattern, or whatever other pattern your testing shows is optimal.
4. The application is a storage layer (i.e. an RDBMS or something like that). Generally speaking, a properly-designed storage layer should be able to persist or retrieve (as a stream) a larger amount of data than what fits in memory, but certain operations (e.g. sorting) may require moving data back and forth between storage and memory repeatedly, so make indices and benchmark your queries appropriately.
5. The problem is not the size of each input, but the sheer number of inputs (e.g. incoming connections). Place a limit on how large your event queue is allowed to get, and drop or refuse excess inputs. This must be combined with a coherent load-balancing strategy, so that inputs can be redirected to less-overloaded tasks. People often object to this on the grounds that dropping inputs is an unacceptable loss of reliability. In some contexts, that's a valid concern, but if you're just running a regular web service over the internet, you already have much worse unreliability at other layers before the traffic even hits your box. Be realistic about what you can accomplish in a real-world setting.

Finally, load test it and record how much memory is actually consumed in a worst-case overload scenario. Repeat those load tests periodically, or at the very least look at your daily peak traffic and how it correlates with your memory consumption, and try to figure out how much worse it could get. That's not perfect, but you can add a safety margin and call it close enough.

At Google, if you use more memory than you asked for, we just kill the whole container. There are tools to help figure out how much memory you should have asked for in the first place, and the memory limit can be quickly or even automatically scaled up if it becomes necessary, but we have been quite successful at saying "this application will only ever need X memory" for a wide variety of applications.