HTML Subresource Integrity
The World Wide Web Consortium (W3C) has approved a new specification intended to thwart cross-site scripting and content-injection attacks in web pages that include content from served from multiple sites. Subresource Integrity (SRI) defines a mechanism for browsers to verify that third-party resources like scripts and images match the exact contents expected by the author of the surrounding page.
Injection attacks can take a number of forms. DNS poisoning can be used to redirect HTTP requests to malicious servers, images can be replaced on compromised content-delivery networks (CDNs) or caches, and so on. The usage of HTTP Strict Transport Security (HSTS) and browsers that block HTTP elements in pages served over HTTPS mitigate the most obvious such risks, but there are still avenues for exploitation.
In particular, the SRI specification notes that a substantial number of sites rely on third-party services to deliver page content, from CDNs to partner sites to open-source JavaScript frameworks. At any particular time, a site administrator may feel reasonably confident about the security of their own servers, but such confidence does not extend to the external, third-party servers involved. Thus, it is in the site owner's interest to be able to attest to the expected content of a resource in a way that the browser can validate.
SRI is designed to combat injection attacks that come through third-party content. The originating site can include cryptographic hashes of third-party script and image files, enabling the user's browser to hash the corresponding files it receives from the third-party servers and verify that the hashes match. It also provides the means for browsers to report validation failures back to site owners. The specification was developed by the W3C Web Application Security Working Group, and received "Recommendation" status (the W3C equivalent of final approval) on June 23.
Content
In its present form, SRI adds an integrity property to the HTML <script> element and to <link> elements of type stylesheet. The expectation is that future revisions of the standard will expand the coverage to include additional HTML elements—perhaps every possible subresource type (images, audio and video elements, iframes, plugin objects, and all hyperlinks).
In its most basic form, the integrity property's value should be a string starting with the hash algorithm used, followed by a dash, then the base64-encoded hash. Support for the SHA-256, SHA-384, and SHA-512 hash functions is required; support for additional functions is optional (although SHA-1 and MD5 are marked as functions that browsers should reject due to their cryptographic weakness).
So a site owner would hash a script of interest, and add that information to their own site's <script> tag:
<script src="https://example.com/privacy-friendly-analytics.js"
integrity="sha384-H8BRh8j48O9oYatfu5AZzq6A9RINhZO5H16dQZngK7T62em8MUt1FLm52t+eX6xO"
crossorigin="anonymous"></script>
Naturally, SRI only provides integrity protection if the HTML page is retrieved over a secure connection. If the page is sent over unencrypted HTTP, attackers can simply replace the integrity value with whatever they choose.
Access control
The crossorigin property shown in the example, while not part of SRI itself, is required. It comes from the Cross-Origin Resource Sharing (CORS) access-control API, which can be used to restrict access to scripts based on the origin of the request and other information. In a CORS-enabled FETCH request, the origin (typically the URL) of the surrounding document is sent to the server along with the value of the crossorigin property (either "anonymous" or "use-credentials"). The server can then grant or deny the request based on whatever access controls it has defined. Filtering out requests based on the request origin is simple enough, although attackers would likely forge that header. The use-credentials option supports additional authentication mechanisms, like HTTP cookies.
In the context of SRI, CORS is used to protect again a particular type of side-channel attack in which the attacker tries to infer information hidden in a resource by pre-computing hashes. For example, if a stylesheet includes some kind of interesting token (whether that is an API key, session ID, username, or something else), an attacker could compute hashes of likely values and send repeated FETCH requests, logging those requests that do not result in a 404 error.
Using CORS, however, the server hosting the stylesheet can turn on the use-credentials option, enabling HTTP cookie-based authentication of every request. Since the attacker cannot supply valid authentication cookies with its brute-force requests, the stylesheet server will drop those requests silently, preventing any information leaks.
Options and reporting
SRI allows integrity properties to include a space-separated list of several hashes; for example:
<link rel="stylesheet" href="https://example.org/fancy-grid.css"
integrity="sha384-H8BRh8j48O9oYatfu5AZzq6A9RINhZO5H16dQZngK7T62em8MUt1FLm52t+eX6xO
sha384-NWFxpV6Pjs1JsG7lQ/N8EGnddVuWW2ft08xHm/X0rsXB5TrAokLI/BsbADXmXVRX"
crossorigin="anonymous">
If a resource matches any of the supplied hashes, it is regarded as having been validated. Thus, site owners can support multiple algorithms or provide hashes for several variants of the same resource. For instance, the version of a stylesheet returned to the browser might differ based on whether or not the user is logged into the site. There is also a mechanism defined for expressing additional options in the integrity property, though none have yet been defined.
The SRI specification mandates that browsers refuse to load or render any element that fails its validation test. It also requires the browser to return an error response to the server of the originating page. Since the error is a response to the specific FETCH request, servers can catch it for diagnostic purposes as well as provide a fallback resource if necessary.
Moving forward
In strict terms, SRI seems like a rather common-sense addition to HTML. Some may even wonder why it has taken this long to standardize, given how long scripting- and content-injection attacks have plagued web users. To some degree, the answer is simply that the web-development community has historically preferred to innovate rapidly and learn security lessons a bit more slowly.
The longer answer is that the increasingly dynamic nature of the web makes it more difficult to canonically describe how HTTP resources are assembled into a page. An example of this can be found in this issue filed against the SRI specification. As it turns out, SRI in its current form only applies to "first level" subresources like CSS files; any resource linked to from within that CSS file (a sub-subresource), such as an image or font, is not checked by the SRI validation scheme. Moreover, because SRI is defined in terms of FETCH requests, fixing the sub-subresource problem may require altering the CSS specification, which is surely not a simple task.
Nevertheless, SRI is clearly a positive step forward. The good
news for users and developers is that SRI is already supported in
Firefox (as of version 45), Chrome (as of its version 45), Opera (in
Opera 38), and in the newest release of the Android browser. CORS
support is available in all major browsers and in most major web servers
and frameworks. There is still much to be done to make the full
contents of web pages cryptographically verifiable but, with this new
specification, matters have taken a big step forward.
| Index entries for this article | |
|---|---|
| Security | Content integrity |
| Security | Web |