HTML Subresource Integrity

Posted Jul 1, 2016 10:14 UTC (Fri) by oever (guest, #987)
In reply to: HTML Subresource Integrity by pabs
Parent article: HTML Subresource Integrity

In this version the integrity attribute is only allowed on <link> and <script>. However, the intent is to extend that:

A future revision of this specification is likely to include integrity support for all possible subresources, i.e., a, audio, embed, iframe, img, link, object, script, source, track, and video elements.

to post comments

HTML Subresource Integrity

Posted Jul 1, 2016 10:26 UTC (Fri) by micka (subscriber, #38720) [Link] (11 responses)

But then, for download, I'm not sure you'd hash the actual resource. Imagine it's 4Gb.
Wouldn't you rather hardcode a kind of challenge in the <a> element that doesn't require to download and store the whole thing before checking it ?

HTML Subresource Integrity

Posted Jul 1, 2016 10:42 UTC (Fri) by oever (guest, #987) [Link] (10 responses)

Why wouldn't you hash a 4GB file? Download sites often already have lists of checksums. An integrity attribute on the link would make the check machine readable. So the browser could verify the download.

Large files usually do not change often so the hash does not have to be calculated often.

If every link on a site would use integrity on the links, then simply checking the hash of /index.html would be enough to know if the site changed.

HTML Subresource Integrity

Posted Jul 1, 2016 11:59 UTC (Fri) by micka (subscriber, #38720) [Link] (9 responses)

Well, I'm talking about the user side. If they need to download it to check the resource (You can't hash a file without downloading it totally) then it's really not different that putting both the link and the hash on the HTML page.

HTML Subresource Integrity

Posted Jul 1, 2016 12:09 UTC (Fri) by james (guest, #1325) [Link]

Except if it's done automatically, by your browser, then the end users don't have to worry about it.

It means Linux distributions can put a download link on an HTTPS page, store the actual resource on a mirror network, and tell new users that as long as they're using a recent browser, they don't need to bother running weird checksum utilities to verify their download.

(Yes, there are well-known limitations with HTTPS, but it's what we're ultimately stuck with: telling new users to get a GPG key is no improvement if the page telling them which GPG key to get is served over HTTPS...)

HTML Subresource Integrity

Posted Jul 2, 2016 23:34 UTC (Sat) by lsl (guest, #86508) [Link] (7 responses)

The browser can just tee it into the hash function while the download is running anyway. You don't start to verify gigantic files without having been instructed to download them, of course.

HTML Subresource Integrity

Posted Jul 3, 2016 0:55 UTC (Sun) by flussence (guest, #85566) [Link] (6 responses)

That's true, but I can see this interfering with all the speculative loading stuff browsers do to reduce page load times. After all, you'd want that hash to be valid *before* you go off parsing potentially evil Javascript/CSS/HTML-imports.

(Of course, it's only a performance issue on first load. Afterwards you can just use the hash as a cache key — and the timing becomes a privacy issue :)

HTML Subresource Integrity

Posted Jul 3, 2016 5:08 UTC (Sun) by ianmcc (guest, #88379) [Link] (5 responses)

Break the file into blocks and hash the blocks separately.

HTML Subresource Integrity

Posted Jul 3, 2016 9:46 UTC (Sun) by oever (guest, #987) [Link] (4 responses)

This is what the BitTorrent protocol does. It's not too popular for browsing yet. There is more attention for the distributed web lately, so it might become more popular.

Using checksums for block is a nice suggestion for the next version of SRI.

HTML Subresource Integrity

Posted Jul 4, 2016 9:20 UTC (Mon) by hkario (subscriber, #94864) [Link] (3 responses)

problem is, that if you want to verify only part of the file, you need to have the _whole_ Merkle tree, not only the root node, to do that

and a whole Merkle tree for 4KiB fragments of a 1MiB file is 8KiB long (using SHA-256, double that for SHA-512)

HTML Subresource Integrity

Posted Jul 5, 2016 15:34 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (2 responses)

> problem is, that if you want to verify only part of the file, you need to have the _whole_ Merkle tree, not only the root node, to do that

You don't necessarily need the whole Merkle tree, if it's organized properly. You just need the root and the siblings of the nodes on the path to the block you want. For example:

Root
|-- A0
|...|-- B0
|...|...|-- C0
|...|...|...|-- D0
|...|...|...|...|-- E0
|...|...|...|...|...|-- Block 0
|...|...|...|...|...\-- Block 1
|...|...|...|...\-- E1
|...|...|...|...|...|-- Block 2
|...|...|...|...|...\-- Block 3
|...|...|...\-- D1 (elided)
|...|...\-- C1 (elided)
|...\-- B1 (elided)
\-- A1 (elided)

The value of each node is a hash of its immediate children. A six-level binary tree can describe up to 2^6=64 blocks (or 256 KiB at 4 KiB per block), but to verify block 0 you only need the hashes of the root, A1, B1, C1, D1, E1, and block 1, for a total of 7 hashes (224 bytes of SHA-256). The intermediate hashes stand in for the parts which weren't downloaded, so in general the more complete subtrees you download the fewer hashes you need. If you download the entire file you only need to know the root hash.

HTML Subresource Integrity

Posted Jul 7, 2016 8:59 UTC (Thu) by hkario (subscriber, #94864) [Link] (1 responses)

the suggestion was so that the browser can do partial rendering as soon as the data is read, that means you need all the leaf nodes as the browser will want to update the rendering each time a new chunk is downloaded and checksummed

By writing 8KiB I meant exactly 8192 bytes of data i.e. only the leaf nodes.

In general yes, if getting parts of the tree requires less time (latency) than downloading the data, you don't need the full Merkle tree, but that doesn't assume that the whole point of using it is to reduce latency.

HTML Subresource Integrity

Posted Jul 7, 2016 16:57 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> the suggestion was so that the browser can do partial rendering as soon as the data is read

Sorry, I wasn't looking at the bigger picture, just the statement that one needs the full Merkle tree to verify part of a file. If you want to download the full file while verifying each part as its received then you will need (almost) the full Merkle tree.

Still, it isn't necessary to download the Merkle tree in full before you can start verifying the data. You could stream the hashes from the Merkle tree in the order that they'll be needed to verify each block, e.g.:

(the root hash is already known from the SRI attribute)
Fetch A1, B1, C1, D1, E1, H(Block 1), and Block 0
Compute H(Block 0)
Compute E0 = H(H(Block 0)|H(Block 1))
Compute D0 = H(E0|E1)
Compute C0 = H(D0|D1)
Compute B0 = H(C0|C1)
Compute A0 = H(B0|B1)
Verify H(A0|A1) = root hash
Fetch Block 1
Verify H(Block 1)
Fetch H(Block 3) and Block 2
Compute H(Block 2)
Verify H(H(Block 2)|H(Block 3)) = E1
Fetch Block 3
Verify H(Block 3)
Fetch E3, H(Block 5), and Block 4
Compute H(Block 4)
Compute E2 = H(H(Block 4)|H(Block 5))
Verify H(E2|E3) = D1
Fetch Block 5
Verify H(Block 5)
etc.

The expectation is that the hashes would be provided in this order in a separate file alongside the content. With this approach you can verify each block as it's downloaded. Even better, to do so you only need to download the hashes for the odd-numbered blocks and intermediate nodes; the even-numbered hashes can be computed from the content under the assumption that the even-numbered nodes are downloaded and verified first within each subtree. For larger files this should cut the number of downloaded hashes almost in half.