Signing and distributing Gentoo
The compromise of the Gentoo's GitHub mirror was certainly embarrassing, but its overall impact on Gentoo users was likely fairly limited. Gentoo and GitHub responded quickly and forcefully to the breach, which greatly limited the damage that could be done; the fact that it was a mirror and not the master copy of Gentoo's repositories made it relatively straightforward to recover from. But the black eye that it gave the project has led some to consider ways to make it even harder for an attacker to add malicious content to Gentoo—even if the distribution's own infrastructure were to be compromised.
Unlike other distributions, Gentoo is focused on each user building the software packages they want using the Portage software-management tool. This is done by using the emerge tool, which is the usual interface to Portage. Software "packages" are stored as ebuilds, which are sets of files that contain the information and code needed by Portage to build the software. The GitHub compromise altered the ebuilds for three packages to add malicious content so that users who pulled from those repositories would get it.
Ebuilds are stored in the /usr/portage directory on each system. That local repository is updated using emerge --sync (which uses rsync under the hood), either from Gentoo's infrastructure or one of its mirrors. Alternatively, users can use emerge-webrsync to get snapshots of the Gentoo repository, which are updated daily. Snapshots are individually signed by the Gentoo infrastructure OpenPGP keys, while the /usr/portage tree is signed by way of Manifest files that list the hash of each file in a directory. The top-level Manifest is signed by the infrastructure team, so following and verifying the chain of hashes down to a particular file (while also making sure there are no unlisted files) ensures that the right files are present in the tree.
Another mechanism to get a Portage tree is to clone a Git repository that contains one. These Git mirrors (such as the one at GitHub) can be used to create a local /usr/portage tree by doing an emerge --sync while pointing at the clone as the Portage source. Finally, there is also the canonical Portage tree Git repository, which is somewhat less convenient to use, since it does not have everything that is needed. It needs some data repositories and for the Portage cache to be updated; those things are handled by the infrastructure team for the Git mirrors. On the other hand, all commits to the canonical tree are signed by Gentoo developers directly, so the infrastructure keys need not be trusted.
Trustless
Jason A. Donenfeld posted an idea for a
"trustless infrastructure
" to the gentoo-dev mailing list on
July 2.
The core of his suggestion is that, instead of having the Gentoo
infrastructure team sign the Portage tree that the distribution
provides, developers of the ebuilds would sign them directly. That way,
if the infrastructure was compromised, there would be no signing keys
available to be abused.
His proposal is that every file in an ebuild would be signed by the developer responsible, so that each file would have a corresponding .asc file that would be distributed with the tree as usual. He also suggested that files not end up in /usr/portage until they have had their signatures verified; instead, they should be copied into a shadow directory to do the verification, then put into /usr/portage if it succeeds. A keyring of the public keys of Gentoo developers would be created and disseminated; eventually, the corresponding private keys would hopefully be stored by the developers on some kind of hardware token.
- Signatures are made by developers, not by infra.
- Portage doesn't see any files that haven't yet been verified.
The reaction to the proposal was somewhat mixed but generally on the
negative side. Rich Freeman pointed out
that a change of this sort would require a flag day of sorts; it could not
easily be added slowly and "grow organically
". But he also
noted that using the existing Git signatures would provide much of what
Donenfeld is
looking for. Freeman also thinks that syncing using Git, rather than
rsync should be considered:
Donenfeld's first reply is a bit
dismissive; it complains about the length of Freeman's reply, for example,
which is not much larger than the proposal itself. Similarly, when Michał
Górny asked about how the keyring would be
distributed and protected, Donenfeld's reply was terse: "Same model as
Arch.
" He did eventually elaborate
on that somewhat, but it did not convince
Górny:
Others also poked holes in the proposal, mostly with regard to key management. Hanno Böck posted a number of questions on key and signature management, particularly with regard to expired, revoked, and newly untrusted keys. Is there some kind of re-signing process that would have to be done? How would that be handled? He concluded:
Kristian Fiskerstrand was more pointed:
"I'll say it, it is unworkable
". He said that there was
always going to be a need for some centralized keys to ensure the
integrity of the repositories. Ulrich Mueller also said that Donenfeld's proposal was unworkable
because it would violate
the Gentoo Package
Manager Specification: "we cannot change that retroactively, because it would break
existing implementations
". Furthermore, Mueller wondered whether
adding another 100,000 files to the tree made sense; it would result in
400MB of extra space on a 4KB-block filesystem, he said.
Overall, it doesn't seem like the proposal is going anywhere, though there are elements of it that are attractive. In particular, removing the infrastructure-key bottleneck and, thus, danger from a compromise of those keys (and/or repositories) is of interest, but there is a lot of work to be done to get there. And, as always, key management is a difficult problem to solve.
Git versus rsync
In a related thread, William Hubbs picked up on Freeman's thinking and asked why Gentoo still relied on rsync rather than using Git directly. It comes down to a number of factors that Freeman summarized. Currently, doing an emerge --sync from a Git clone will leave the tree in a corrupted state if it doesn't verify. Also, rsync is more bandwidth efficient for less-frequent updates; it is not clear where the crossover point is, but he guessed Git would be more efficient if updates were done more often than weekly. There are more rsync mirrors, as well, though he is not sure that makes much of a difference in practice.
Beyond that, Freeman noted that Git history makes for more disk-space usage. He personally uses Git, and others would like to do so, but the disk-space issue makes that harder. Matt Turner said that he has set aside a 1GB partition for the tree, which works fine for the roughly 600MB needed by rsync, but not for Git. A shallow clone of the Git repository is roughly the same (around 660MB), but each pull adds to that, so without some kind of "auto-trimming", Git will grow quickly, Freeman said
All of the key-management issues are still present for the Git tree, as well. Even though the commits are signed by the developers, those keys need to be distributed and managed over time.
The GitHub mirror compromise has clearly led to some thinking (and
rethinking) within the project about its practices and how they might be
improved. It is not clear that there are any real conclusions that have been
reached, much less plans made, but considering the various parts of the
problem is certainly to the good. One concrete thing that has come out of
this incident is a Portage
security page on the Gentoo wiki. It explains how to "dispel
doubts regarding the security of the portage tree on my system
".
There are sections for each of the four ways to keep a Portage tree
updated that shows what needs to be trusted for each (e.g. keys, web of
trust, good
security practices) and how to test to ensure the integrity of the Portage
tree.
| Index entries for this article | |
|---|---|
| Security | Distribution security |
| Security | Integrity management |