Subverting HTTPS with BREACH

By Jake Edge
August 7, 2013

An attack against encrypted web traffic (i.e. HTTPS) that can reveal sensitive information to observers was presented at the Black Hat security conference. The vulnerability is not any kind of actual decryption of HTTPS traffic, but can nevertheless determine whether certain data is present in the page source. That data might include email addresses, security tokens, account numbers, or other potentially sensitive items.

The attack uses a modification of the CRIME (compression ratio info-leak made easy) technique, but instead of targeting browser cookies, the new attack focuses on the pages served from the web server side. Dubbed BREACH (browser reconnaissance and exfiltration via adaptive compression of hypertext—security researchers are nothing if not inventive with names), the attack was demonstrated on August 1. Both CRIME and BREACH require that the session use compression, but CRIME needs it at the Transport Layer Security (TLS, formerly Secure Sockets Layer, SSL) level, while BREACH only requires the much more common HTTP compression. In both cases, because the data is compressed, just comparing message sizes can reveal important information.

In order to perform the attack, multiple probes need to be sent from a victim's browser to the web site of interest. That requires that the victim get infected with some kind of browser-based malware that can perform the probes. The usual mechanisms (e.g. email, a compromised web site, or man-in-the-middle) could be used to install the probe. A wireless access point and router would be one obvious place to house this kind of attack as it has the man-in-the-middle position to see the responses along with the ability to insert malware into any unencrypted web page visited.

The probes are used as part of an "oracle" attack. An oracle attack is one where the attacker can send multiple different requests to the vulnerable software and observe the responses. It is, in some ways, related to the "chosen plaintext" attack against a cryptography algorithm. When trying to break a code, arranging for the "enemy" to encrypt your message in their code can provide a wealth of details about the algorithm. With computers, it is often the case that an almost unlimited number of probes can be made and the results analyzed. The only limit is typically time or bandwidth.

BREACH can only be used against sites that reflect the user input from requests in their responses. That allows the site to, in effect, become an oracle. Because the HTTP compression will replace repeated strings with shorter constructs (as that is the goal of the compression), a probe response with a (server-reflected) string that duplicates one that is already present in the page will elicit a shorter response than a probe for an unrelated string. Finding that a portion of the string is present allows the probing tool to add an additional digit or character to the string, running through all the possibilities checking for a match.

For data that has a fixed or nearly fixed format (e.g. email addresses, account numbers, cross-site request forgery tokens), each probe can try a variant (e.g. "@gmail.com" or "Account number: 1") and compare the length of the reply to that of one without the probe. Shorter responses correlate to correct guesses, because the duplicated string gets compressed out of the response. Correspondingly, longer responses are for incorrect guesses. It is reported that 30 seconds is enough time to send enough probes to essentially brute force email addresses and other sensitive information.

Unlike CRIME, which can be avoided by disabling TLS compression, BREACH will be more difficult to deal with. The researchers behind BREACH list a number of mitigations, starting with disabling HTTP compression. While that is a complete fix for the problem, it is impractical for web servers to do so because of the additional bandwidth it would require. It would also increase page load times.

Perhaps the most practical solution is to rework applications so that user input is not reflected onto pages with sensitive information. That way, probing will not be effective, but it does mean a potentially substantial amount of work on the web application. Other possibilities like randomizing or masking the sensitive data will also require application rework. At the web server level, one could potentially add a random amount of data to responses (to obscure the length) or rate-limit requests, but both of those are problematic from a performance perspective.

Over the years, various attacks against HTTPS have been found. That is to be expected, really, since cryptographic systems always get weaker over time. There's nothing to indicate that HTTPS is fatally flawed, though this side-channel attack is fairly potent. With governments actively collecting traffic—and using malware—it's not much of a stretch to see the two being combined. Governments don't much like encryption or anonymity, and flaws like BREACH will unfortunately be available to help thwart both, now and in the future.

Index entries for this article
Security	Encryption/Web

to post comments

Subverting HTTPS with BREACH

Posted Aug 8, 2013 4:48 UTC (Thu) by Cato (guest, #7643) [Link] (2 responses)

Disabling compression is only a short term option but the impact depends on the site - you can still compress static JavaScript, CSS and images served by the web server, as those don't carry any secrets such as CSRF tokens.

Per-request CSRF randomization seems like a relatively easy thing to do for most web frameworks, but the huge number of custom apps may find that harder (and some of them have never heard of CSRF).

The fact that this attack requires the ability to spy on a connection (either through an infected router or a malicious access point on open WiFi in public area) limits the impact a bit.

The point about "browser-based malware" makes it sound like a persistent infection of the browser, but in fact all that's required is to have the victim visit a website controlled by the attacker, e.g. through a phishing link or a MITM attack on some non-encrypted website visited through the malicious access point. I would call this 'visiting a malicious site' which is a much lower bar than injecting malware.

Part of the issue with SSL attack mitigation generally is that some older stable distros have very old versions of OpenSSL - e.g. Ubuntu 10.04 LTS and Debian 6.0, which are both still getting updates, are on OpenSSL 0.9.8, which is vulnerable to various SSL attacks such as BEAST. It would really help if, even for stable distros, there was an easy way to update key packages such as OpenSSL faster than normal - though perhaps someone knows if the Debian patched 0.9.8 has all the features required to mitigate BEAST etc?

The Django folks have a good short summary of the mitigations, and a link to a good SSL hardening article: https://www.djangoproject.com/weblog/2013/aug/06/breach-a...

Subverting HTTPS with BREACH

Posted Aug 8, 2013 5:26 UTC (Thu) by Cato (guest, #7643) [Link]

Only slightly off topic, as this simplifies malware injection for BREACH: it turns out that injecting real malware into browsers (i.e. the in-page JavaScript is modified for a trusted website) is really easy now - someone has released a banking malware kit, a bit like Zeus, targeting desktop Linux.

Perhaps this is to target people who have switched from Windows to Linux for banking (as I've been advising for years): http://arstechnica.com/security/2013/08/hand-of-thief-ban...

Of course with such a kit you can just inject malware directly into the website pages you would otherwise attack with BREACH, and capture passwords, perform illicit transactions, etc.

Subverting HTTPS with BREACH

Posted Aug 8, 2013 19:34 UTC (Thu) by Sesse (subscriber, #53779) [Link]

You don't even require full randomization of the token (which would require storing it somewhere); just blinding, as recommended in the paper, will do. Make a random pad, XOR the CSRF token with the pad, and send both.

Unfortunately, this only protects the CSRF token; it does nothing for any other information on the page that the attacker might want to read.

/* Steinar */

Barriers in zlib

Posted Aug 8, 2013 8:48 UTC (Thu) by epa (subscriber, #39769) [Link] (3 responses)

I suggest that zlib and other compression code should support barriers in its compression API. These are special markers which reset the compression state, so that data before the barrier cannot affect the compression of data after it. Then the browser and server can put barriers into the stream between the HTTP headers and body - so injecting stuff into headers would not let an attacker get any clues about the body, although it might allow guessing the other headers. Or as a crude but effective solution, put a barrier after every header line.

I'm sure zlib already can do this, by the way, it's just a case of making a convenient API - which will never be as easy to use as 'compress this block of memory' but might come close.

Barriers in zlib

Posted Aug 8, 2013 9:46 UTC (Thu) by oseemann (subscriber, #6687) [Link] (1 responses)

This would not help since BREACH works entirely on the HTTP response body. The header is not involved.

Barriers in zlib

Posted Aug 8, 2013 10:40 UTC (Thu) by epa (subscriber, #39769) [Link]

Ah, sorry I confused it with some other exploit. In that case the browser would need to stick in barriers at every point where attacker-controllable data is followed by user data. Tracking every bit of the content with the necessary paranoia could be tricky.

Barriers in zlib

Posted Aug 9, 2013 14:18 UTC (Fri) by kleptog (subscriber, #1183) [Link]

There's no need for explicit barrier support, the concatenation of two zlib streams is a valid zlib stream that decompresses to the concatenation of the original texts. So you just need to compress the blocks individually and concatenate them.

Horrible for your compression ratio though.

Subverting HTTPS with BREACH

Posted Aug 15, 2013 10:07 UTC (Thu) by dlang (guest, #313) [Link]

if the webserver would just put a small amount of random data into the response (set a cookie to a random value or set some random data that's hidden in the HTML for example), it would add a huge amount of noise to the compressed length and pretty much eliminate this vulnerability.

This also requires that the site allow an unlimited number of fetches of a page without any change to that page. this should be something that sites can alert on fairly easily.