Keysafe, a cloud-based key backup proposal

By Nathan Willis
August 10, 2016

In recent weeks, we have looked at the use of smartcards to securely generate and store cryptographic material, such as the secret half of OpenPGP key pairs. Coincidentally, Joey Hess recently shared his thoughts on an alternative approach to managing secret keys that he calls keysafe. Hess's approach divides the key into shards that are subsequently stored on multiple, independent cloud servers. At present, it is only a proposal, but it is one that he has asked the community to consider and weigh in on.

Hess announced keysafe on his blog in an August 5 post. The impetus, he said, was to overcome users' fear that they would misplace a key and thereby lose access to their data:

I feel that simple backup and restore of gpg keys (and encryption keys generally) is keeping some users from using gpg. If there was a nice automated solution for that, distributions could come preconfigured to generate encryption keys and use them for backups etc.

So Hess set out to define not just an algorithm for safely storing key material on remote servers, but one that could effectively automate the process, hiding complexity from the user. The proposal is up for comment from the rest of the community.

In brief, keysafe would allow users to store a secret (such as an OpenPGP secret key, but perhaps other data types as well) by first encrypting the key and a checksum (or, perhaps, hash), splitting the result into three shards with the Shamir's Secret Sharing algorithm, and generating unique identifiers for each each shard. The shards would be uploaded to separate servers through a Tor hidden service endpoint, using only the identifiers as references. In other words, the shards themselves are anonymous, and the servers would not store any user-identifying information with them.

But the shard identifiers are initially generated from a user-supplied password and the key ID of the secret, so that the user can later regenerate them and request the shards from the server pool. The same user-supplied password (plus a random component) would also be used to generate the key that encrypts the secret. That way, the user only has one password to remember.

Adding the random component makes it more difficult for attackers to brute force likely passwords. Furthermore, the cryptographic functions selected (such as Argon2) were chosen because they take some amount of real-world time to run. For a backup process like keysafe, this is acceptable, while attackers will find it an impediment to breaking the system.

The intended use case for the system is users encrypting private data. Consequently, the public key IDs corresponding to the secret keys stored in keysafe would not need to be uploaded to keyservers—in fact, uploading the key IDs would have a detrimental effect, because those IDs are used to generate the shard identifiers. Attackers trying to access keysafe material might start with key IDs that they harvest from public keyservers; if the key ID is available that way, it greatly reduces the search space.

Whether any one key pair is too sensitive to trust to a set of remote cloud servers is, ultimately, a personal question. Hess noted that when he was a Debian Developer, his secret key would have been a high-value target that could be exploited to compromise millions of machines; storing that sort of key on a cloud server would not be advisable.

The steps to store a secret key with ID keyID are:

The user selects a password and an item name of their choosing (e.g., "KeyForFileFoo").
Keysafe generates K = argon2(password, salt=item_name).
Keysafe generates N1 = argon2(keyID, salt=1+item_name).
Keysafe generates N2 = argon2(keyID, salt=2+item_name).
Keysafe generates N3 = argon2(keyID, salt=3+item_name).
Keysafe generates random R.
The secret is encrypted with aes(secret+checksum, key=K+R).
The encrypted payload is sharded with the Shamir algorithm, producing shards S1, S2, and S3.
S1, S2, and S3 are uploaded over Tor to separate servers, using N1, N2, and N3 as the identifiers, respectively.

Hess recommends the Argon2 algorithm because it is resistant to GPU and ASIC optimization. He suggests tuning its parameters to take ten minutes of CPU time to generate the required material. Like several of the other implementation details, though, the emphasis at this stage is not on the specific numbers, but on the design.

The Shamir sharding algorithm, in this case, would set the threshold parameter to two, which means that while the payload is split into three shards, any two of them can be combined to recover the original secret. The Shamir algorithm is not new, of course, and other implementations exist. So, while the sharding operation is essential to making keysafe work, the real benefits of the proposal come with the addition of the remote cloud servers, which can be used by client-side programs to store and retrieve secrets as needed.

It is important in the long run that the servers be run by trusted entities, but Hess has put safeguards into the design to protect users from a malicious server operator.

He outlined several conditions that would be placed on the servers. First, they would need to not keep any logs and would have to sanitize the timestamps on all uploaded objects (both measures to resist correlation attacks against users). Second, whenever a client program requests a shard, the server would reply with a proof-of-work puzzle for the client to complete (thus rate limiting requests). Third, servers would not provide any method for clients to enumerate the items available in storage.

Hess suggests including a server list with the client program and making the first three servers the defaults, but allowing the user to specify other servers if desired. The client would also start requesting objects from the default servers, but would keep trying other servers on the list until it recovers the necessary shards. That way, the user does not need to remember which servers were initially used.

To recover a secret, the user needs to provide the item name and password. The keysafe client then takes the following steps:

Regenerate K, N1, N2, and N3 from the name and password.
Contact servers, requesting N1, N2, and N3, until any two of S1, S2, and S3 is successfully retrieved.
Recombine the two shards, thus recovering the stored ciphertext.
Guess values for R, AES-decrypting the ciphertext with K+R until the result is a secret payload with a checksum that validates.

Practically speaking, servers would want to enforce a size limit on uploaded shards, to prevent malicious users from abusing the system to store large files. Hess notes that this limit might be tricky to establish, since key sizes can get quite big if the key accrues a lot of signatures. But the signatures are not, strictly speaking, necessary for decryption, so they could be stripped away prior to the initial encryption and sharding stages.

Hess also discusses practical considerations for indicating the version of the keysafe protocol used in a stored object. Simply appending a version number to the shards uploaded to the servers is problematic, because there will presumably be fewer users in the early days, so those users' data would be easier for attackers to correlate. Appending the version number right before the sharding step is also bad, because that would provide an easy way for attackers to discover that they have found two shards that fit together: when combined, a valid version string would be visible as plaintext.

He suggests an alternative approach: for every new version of the system, the Argon2 parameters would be changed. Clients could then try different parameters in turn. The downside, he notes, is that this process significantly increases the time needed to recover a secret.

Versioning remains open to discussion, as does at least one other facet of the original proposal: in the final step, the client spends some non-trivial amount of time trying to decrypt the de-sharded ciphertext by guessing possible values of the R component. This step is intended to thwart brute-force attackers but, as described, is it a somewhat imprecise approach. He suggests choosing the length of R such that a CPU could decrypt the ciphertext in less than an hour, and a GPU in around one minute.

An alternative mentioned by Hess is the use of tunable work puzzles, as detailed in a 2015 paper [PDF] released by Ericsson Research. That approach would encrypt R using a tunable cryptographic function that takes the user-supplied password and a "difficulty bit string" as additional inputs, and attach the function's output to each shard. Since the user knows the password, the time required to "solve" the puzzle (and recover R) would depend only on the length of the difficulty bit string, while for attackers, it would depend on the size of both parameters combined.

Hess has been updating the keysafe page with comments sent in by others; one expects that trend to continue for at least some time. In the end, though, figuring out the cryptographic side of the proposal will only get part of the way to a solution. Equally important challenges will be developing the keysafe client and server software and getting others interested in deploying it. Perhaps finding people interested in the client side is no great difficulty—after all, the free-software community is generally interested in OpenPGP—but persuading trustworthy entities to stand up and run keysafe servers in a global pool might take more time.

Hess indicated in the original blog post that he regards simple key backup as a missing feature of git-annex (the project on which he does funded development work), so one might expect to see a working prototype emerge before too long if a consensus about the value of the system emerges. But Hess also seems keen on getting input about the design of the system from various subject-matter experts before moving forward, so it is also possible that this is just the beginning of a long discussion.

Index entries for this article
Security	Encryption/Key management

Keysafe, a cloud-based key backup proposal

Posted Aug 11, 2016 15:08 UTC (Thu) by joey (guest, #328) [Link] (2 responses)

Well, this article is a surprise! The design for keysafe is still evolving quite a bit, thanks to some security reviews I've gotten from others. I have not checked the writeup in this article again the current design. Hope this leads to some more good feedback.

Also, I have a very early implementation underway.

Keysafe, a cloud-based key backup proposal

Posted Aug 11, 2016 15:14 UTC (Thu) by joey (guest, #328) [Link]

"But the shard identifiers are initially generated from a user-supplied password and the key ID of the secret"

This was the case in my original design, but as we analized attack methods, it became clear that using the password to generate the shard identifier made brute-force password cracking attacks much easier to perform. So the current design doesn't do that.

Keysafe, a cloud-based key backup proposal

Posted Aug 12, 2016 10:15 UTC (Fri) by joe.bob (guest, #109632) [Link]

This is excellent work! I will most certainly be keeping an eye on your progress.