Defending against Rowhammer in the kernel

By Jonathan Corbet
October 28, 2016

The Rowhammer vulnerability affects hardware at the deepest levels. It has proved to be surprisingly exploitable on a number of different systems, leaving security-oriented developers at a loss. Since it is a hardware vulnerability, it would appear that solutions, too, must be placed in the hardware. Now, though, an interesting software-based mitigation mechanism is under discussion on the linux-kernel mailing list. The ultimate effectiveness of this defense is unproven, but it does show that there may be hope for a solution that doesn't require buying new computers.

Rowhammer works by repeatedly reading the same memory location a large number of times. With contemporary DRAM, reading a location is a destructive act; the memory controller must rewrite the data into that location after each read. Those rewrites can cause neighboring memory cells to discharge slightly; if an attacker causes rewriting to happen too many times before the next regular refresh cycle happens, they can corrupt data in those neighboring cells. The result is seemingly random bit flips in nearby memory.

This would appear to be a difficult vulnerability to exploit. An attacker must find memory that is known to be adjacent to data of interest, then manage to corrupt that data in a useful way. But attackers can do surprising things; a fair number of Rowhammer exploits have now been posted. That includes the "Drammer" exploit that works on many Android devices. Rowhammer is thus a serious problem. Unfortunately, the only proper solution appears to be to increase the memory refresh rate, something that cannot generally be done in deployed hardware.

An intriguing alternative turned up on the linux-kernel list, though its nature wasn't immediately clear. Pavel Machek asked a question that raised some eyebrows: "I'd like to get an interrupt every million cache misses... to do a printk() or something like that." Developers naturally wondered what he was up to. The answer turns out to be an in-kernel Rowhammer defense.

Contemporary CPUs are generally equipped with performance-monitoring units (PMUs) that can track many aspects of how the system is running. Normally the PMU is used by utilities like perf for system profiling and performance tuning. But one of the events the PMU can track is memory-cache misses. For Rowhammer to work, it must act on main memory; reads from cache will not be effective. That means forcing a cache miss for each of, generally, hundreds of thousands of reads to the same address. If the PMU can be used to detect those cache misses, it might be able to detect — and mitigate — Rowhammer attacks.

The patch is evolving rapidly as this is being written; the current version takes the form of a "nohammer" kernel module. It has a (currently hardwired) parameter called dram_max_utilization_factor, which determines the maximum cache-miss rate allowed in the system. If it is set to 8 (the default), then the nohammer module will trigger if the cache-miss rate exceeds 1/8 of the theoretical maximum. When that happens, the CPU will be forced to delay for a period long enough to allow the next DRAM refresh to run; 64ms by default. In theory, this delay should slow down a Rowhammer attack enough to make it ineffective.

It's a nice theory, but it still suffers from a number of practical problems at this point. To begin with, a 64ms hard delay will add a huge latency to anything the affected CPU is supposed to be doing. If it happens with any frequency at all, it will be noticed, even on systems that are not highly latency-sensitive. Ingo Molnar has suggested making the delay shorter and more frequent; that would reduce the maximum imposed latency, but doesn't change the overall nature of the defense.

The PMU can detect a high rate of cache misses, but it cannot tell the kernel whether all of those misses involved the same address or not. So it could be triggered by an application that is, for example, reading quickly through a large array of data in memory. Thus, it seems entirely plausible that a number of legitimate workloads will generate high rates of cache misses over time that will be mistaken for Rowhammer attacks. Those workloads will be penalized severely by this patch, for no actual gain. That will quickly lead to people turning the Rowhammer defense off.

The PMU is a per-CPU mechanism, but memory is globally accessible in a multiprocessor system. The patch has some tests for an attack that is conducted by two CPUs simultaneously, but does not scale well to systems with more processors than that. It's not entirely clear how it can be made to work in a setting where, say, eight processors are all pounding the same location simultaneously.

Finally, Mark Rutland raised an important point: this mechanism depends entirely on counting cache misses. If the attacker is able to obtain an uncached memory mapping, all operations on that memory will bypass the cache entirely and will not be counted. It would appear that Drammer makes use of just such a mapping, so this module may well not be an effective defense against it. Detecting attacks against uncached memory could prove to be a much harder problem.

So it is far too soon to say that the kernel has a useful defense against Rowhammer attacks. But this work shows that, when one is willing to pay the price, a defense might just be possible, at least for some types of attacks. That is an improvement over a world where the only real defense is to buy new hardware — once the vendors get around to producing Rowhammer-resistant systems. It will be interesting to watch where this work goes and how effective it becomes.

Index entries for this article
Kernel	Security/Security technologies
Security	Linux kernel

to post comments

Defending against Rowhammer in the kernel

Posted Oct 28, 2016 16:32 UTC (Fri) by cesarb (subscriber, #6266) [Link]

Interesting.

I wonder if it would be possible with the current perf system calls to tell the kernel "stop this thread if it has too many cache misses". That could be used by for instance Javascript interpreters to protect themselves against rowhammer attacks attempting to escape the sandbox. In the common scenario of "everything running on this machine is trusted except the Javascript running in the browser", that might be very useful.

Defending against Rowhammer in the kernel

Posted Oct 28, 2016 21:17 UTC (Fri) by mst@redhat.com (guest, #60682) [Link] (8 responses)

> ... the only proper solution appears to be to increase the memory refresh rate ...
I think ecc memory effectively addresses the problem too - isn't this true?

ECC memory

Posted Oct 28, 2016 21:21 UTC (Fri) by corbet (editor, #1) [Link] (5 responses)

I've run across statements to the effect that, since rowhammer can flip multiple bits, ECC memory is not, by itself, a complete defense. But that's about all I know...

ECC memory

Posted Oct 28, 2016 22:57 UTC (Fri) by nix (subscriber, #2304) [Link] (4 responses)

See the original paper, <https://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf>, section 6.3. Summary: it doesn't help -- well, it may well convert attacks into DoSes for systems that panic on multi-bit errors, but it will definitely cause many uncorrectable errors, since ECCRAM is designed on the assumption of independent, uncorrelated errors, and the errors induced by rowhammer are most definitely neither independent nor uncorrelated.

ECC memory

Posted Oct 28, 2016 23:17 UTC (Fri) by ploxiln (subscriber, #58395) [Link] (3 responses)

ECC memory makes an un-correctable multi-bit error which causes a crash much more likely than an un-detectable pattern of 3+ simultaneous bit flips. Crashing the system (often with some indication somewhere of "un-correctable memory error") is a notable improvement over successful exploitation.

ECC memory

Posted Oct 31, 2016 12:09 UTC (Mon) by hmh (subscriber, #3838) [Link] (2 responses)

Actually, it doesn't even have to crash the system. It will report an UE, which on some platforms with better RAS, AFAIK, actually results in the kernel looking at what uses that page, and force-killing it instead.

Obviously, if the one using that page is the kernel, it has to Oops, but...

ECC memory

Posted Nov 5, 2016 3:28 UTC (Sat) by mikemol (guest, #83507) [Link] (1 responses)

Interesting. That turns Rowhammer into a means of killing someone else's process without the necessary privileges.

ECC memory

Posted Nov 7, 2016 22:37 UTC (Mon) by JanC_ (guest, #34940) [Link]

Maybe we can add a kernel feature that signals the process that something is wrong with its memory, and if it can correct it, it's allowed to go on… ;)

Defending against Rowhammer in the kernel

Posted Oct 28, 2016 23:33 UTC (Fri) by thestinger (guest, #91827) [Link] (1 responses)

The hardware mitigation for rowhammer is LPDDR4's optional TRR feature (target row refresh). The memory manufacturers can still screw things up by caring more about performance (timings) and yields than creating a robust product. ECC is nice as an extra layer, but it's not a direct mitigation. It can often turn rowhammer into a denial of service instead of something worse, but it's not a guarantee.

Defending against Rowhammer in the kernel

Posted Oct 31, 2016 6:35 UTC (Mon) by marcH (subscriber, #57642) [Link]

> The memory manufacturers can still screw things up by caring more about performance (timings) and yields than creating a robust product.

So like software!

(coming next: a car analogy)

Defending against Rowhammer in the kernel

Posted Oct 29, 2016 4:23 UTC (Sat) by pabs (subscriber, #43278) [Link]

Some other comment threads:

https://news.ycombinator.com/item?id=12821019
https://plus.google.com/+AlanCoxLinux/posts/AFqqpTPpKZ5

Rewrite after read is performed internally to DRAM, not by controller

Posted Oct 30, 2016 7:03 UTC (Sun) by brouhaha (subscriber, #1698) [Link]

It is correct that DRAM data reads require a row to be rewritten, just like magnetic core memory did, but for DRAM the rewrite is actually done internally by the DRAM chip itself. The controller, in the North bridge, CPU, or elsewhere outside the DRAM, doesn't take any special action to cause that rewrite.

This distinction doesn't in any way change the nature of the Rowhammer problem, so perhaps I'm being overly pedantic.

With ECC memory, the memory controller may be configured for scrubbing, in which case the memory controller does sweep through the DRAM, reading all locations and rewriting them if there is a correctable error. However, the DRAM still does rewrites internally for all memory read cycles, including scrub reads.

Often the ECC scrub rate is configurable, e.g., in BIOS settings. Unfortunately even with a high scrub rate, Rowhammer can still trigger uncorrectable errors within the scrub interval. However, a high scrub rate will likely reduce the probabilty of undetectable errors.

Defending against Rowhammer in the kernel

Posted Oct 30, 2016 12:18 UTC (Sun) by spender (guest, #23067) [Link] (2 responses)

More comments here from people with actual experience:
https://twitter.com/halvarflake/status/792314613568311296

My prediction is it won't matter whether it works or not, it'll be heralded as success in the same vein as KASLR.

-Brad

Defending against Rowhammer in the kernel

Posted Nov 17, 2016 21:57 UTC (Thu) by mcortese (guest, #52099) [Link] (1 responses)

What a strange comment! Managing, in one sentence, to insinuate skepticism about the patch itself, and bad faith in whoever reports it.

Defending against Rowhammer in the kernel

Posted Nov 17, 2016 22:14 UTC (Thu) by spender (guest, #23067) [Link]

Do you have anything technical to contribute? I don't see any reason for your comment. Maybe you just haven't been here long enough if you don't notice the theme of optimistic upstream exceptionalism in all of the articles, facts be damned. Perhaps you could tell the world why you disagree with one of the key people who discovered and exploited rowhammer? Perhaps you could explain how you'd solve the limitations mentioned about the patch? Or perhaps you could point the rest of us to accurate reporting of KASLR where it's mentioned as a failure and waste of time, given the numerous generic defeats against it that have worked ever since its existence and despite numerous "improvements"?

I'll be waiting!

-Brad

Defending against Rowhammer in the kernel

Posted Nov 1, 2016 9:56 UTC (Tue) by bytelicker (guest, #92320) [Link] (4 responses)

What an interesting article! Enjoyed it very much!

My guess is that in the near future hardware-based security holes will be utilized much more frequently. I think this area has just as many fallacies as software; they're just more hidden in the current state of the hardware exploit history. I'm not even sure how critically security in general hardware is treated?

Does anyone know of other examples of big security holes in hardware imposed through software?

Defending against Rowhammer in the kernel

Posted Nov 1, 2016 21:33 UTC (Tue) by dtlin (subscriber, #36537) [Link]

The F00F bug and the Xbox A20 gate come to mind.

Defending against Rowhammer in the kernel

Posted Nov 10, 2016 22:38 UTC (Thu) by Wol (subscriber, #4433) [Link]

Most disk drives now have a SoC operating system (often linux, I believe!) which can be compromised.

Cheers,
Wol

Defending against Rowhammer in the kernel

Posted Nov 10, 2016 23:46 UTC (Thu) by dfsmith (guest, #20302) [Link] (1 responses)

One of the qualifying tests for hard drives was to see if you could corrupt nearby sectors when repeatedly rewriting a block. Answer: seen more often than we'd like!
(And this is one of the few areas where SMR would be an advantage.)

Defending against Rowhammer in the kernel

Posted Nov 11, 2016 5:25 UTC (Fri) by magila (guest, #49627) [Link]

There has been code in hard drives for a long time which re-writes adjacent sectors/tracks if there are too many writes to a particular area. If you are seeing corruption due to repeated writes then it is most likely a bug in the firmware and I'm sure the manufacturer would love to know about your workload so they can fix it.

Defending against Rowhammer in the kernel

Posted Nov 11, 2016 4:50 UTC (Fri) by ras (subscriber, #33059) [Link]

Good lord. I'm been or the receiving end of "yeah I know my hardware isn't behaving as it should, but it's out there now so it's your [the software guy's] problem", but this takes it to a whole new level.