Decentralization for the web
Holger Krekel is a longtime Python developer, starting back in 2001. He is one of the co-founders of the PyPy project, created the pytest Python testing tool, and has worked on several other Python-based projects. But his keynote on the third day of EuroPython 2015 was not particularly Python-centric—it was, instead, a look at the history of centralization in communication technology and some thoughts on what might lie in the future.
Krekel began by noting that he has given lots of talks at various conferences, but that he still gets nervous and uneasy when standing up in front of an audience. Part of the problem is that we are wired to feel uneasy when there are lots of people staring at you; in archaic times that probably meant that they wanted to kill you. Since it is a natural reaction, overcoming it is difficult, but recognizing the underlying cause helps. Those giving their first talk will likely feel that even more strongly, he said.
Over the last few years, Krekel has been meeting with other communities, including Node.js, Erlang, and Haskell groups, but also other non-language-specific groups that focus on higher-level concepts. His talk was meant to relay some of what he has learned. But first, he wanted to talk about the past in order to talk about the future.
The past
"Real rocket science" took place almost 50 years ago, with the Apollo moon landing. The Apollo missions set the speed record for humans at roughly 40,000 km/hour. But after that, the rocket science advances started to slow down. From 1685 on, the number of scientific papers published doubled every fifteen years—he likened it to Moore's Law—but that leveled off in the 1970s.
Who was doing this rocket science, he asked; who was programming these rockets and spacecraft to land on the moon? He put up a slide of Margaret Hamilton with a stack of printouts as tall as she that was the source code for the Apollo program. She led the programming effort for the project.
In the 1960s, more women than men were programmers. That changed because more money flowed into the computer industry, which attracted more men. Research has shown that as fields attract more money, men tend to dominate. In the early days, programming was seen as a "lowly" task that involved typing so it didn't seem particularly important. Hamilton was one of the leading rocket scientists.
He then showed a picture of an old rotary-dialed phone. In 1939, those types of phones started using "pulse dialing", where each digit of the phone number actually controlled relays across the country to switch the wire to connect to the phone at the other end. That was all run by one company (e.g. AT&T in the US), which controlled all of the hardware (phones, relays, network) to make it run reliably.
In 1974, another "rocket science invention" came about using modems that allowed creating an overlay network on top of the voice network. Many researchers believed it was the wrong approach, though, because it could not be any more effective than the underlying network. So they came up with the idea of a packet-switched network where each packet gets a "higher-level telephone number" (the IP address) for its destination.
That idea had a big advantage that was not obvious at the time: there are no setup costs, unlike with phone calls. You can just put a packet on the wire and the router will make a decision about how to forward it toward its destination. It was envisioned as a distributed network and one that was resilient in the face of failures—packets can be rerouted around them. It turned out that decentralization "was a bit of a hippie dream", he said.
The present
What actually happened is that certain endpoints started collecting the lion's share of the traffic. The IP network is still kind of using the idea of the original telephone network, where there are endpoints that we connect to. Instead of an evenly distributed network, we have a collection of star networks, where many people connect to a single telephone number.
Why did this happen? Companies recognized that if they are the endpoint that everyone uses, they have to be able to deal with all that traffic, but they get an excellent overview of what all that traffic is doing. Scaling up to handle the traffic is more than offset by the gains made by having more traffic information.
It comes from the economies of scale. Going to 100 users costs more than going from 100 to 10,000. That makes the "complexity tax" regressive. Companies can pay less and less to get more and more users. There is a tipping point where that process "becomes very profitable" from advertising and things like that, he said.
Krekel quoted former Facebook researcher Jeff Hammerbacher, who said: "The best minds of my generation are thinking about how to make people click ads. That Sucks." Instead of spending time on "getting us into space, flying cars, or whatever", the best minds in IT are focused on how to get people to click more ads.
So, we have ended up with million (or billion) to one architectures on the web. Lots of startup companies are trying to become one of the mediators of the traffic, but the impetus behind the traffic is that people want to connect with other people. They want to view videos or communicate with text and pictures, but they do that through YouTube, Twitter, and the like. On a social level, people are "peer to peer", but today there are intermediaries that monetize those interactions and profit from it.
The future
Returning to the subject of space, Krekel noted that Elon Musk wants to get humanity to Mars by 2026. Do we think that 41-year-old technology like TCP/IP or the 21-year-old HTTP will work on Mars? Can you call Gmail as a web app on Mars? Someone in the audience suggested that it would just take "patience" which elicited widespread laughter. Krekel said that the protocols we have will not work on Mars.
But we already have Mars on Earth in places where internet connectivity is not all that good. In 1981, there were 300 computers connected to the internet, but now there are now billions of devices in the world that are still using this phone-based model. It turns out that's not actually true, he said, some are following other models. There are communication and synchronization mechanisms that some of these devices are using to transfer data directly between them without using the internet.
For example, you can synchronize your mobile phone and laptop directly without using some remote server. Sometimes it is more practical to use a remote server "in California somewhere" to transfer files between two local devices, but there are ways to avoid having to do that. These mechanisms don't use standard protocols, but instead use some proprietary protocol. It is much more efficient to transfer files locally, especially given that upload bandwidth is often much smaller than that for downloads.
There is an organization based in Berlin called Offline First that has recognized that our endpoints have become much more powerful, with lots of local connectivity, so that it doesn't make sense to make connections across the world to talk to something local. People want local applications that work, even if they are not connected to the internet. At some point, the device will be able to get a connection to the net; when it does the application can simply synchronize. Like its name implies, the group is focused on an offline-first strategy.
If you look at successful projects over the last ten years, many are using synchronization and replication techniques that don't work according to the client-server paradigm. Git is a good example, he said, since it stores the whole history locally and allows local changes that eventually get synced, which is offline-first thinking.
Another example of distributed networks is BitTorrent, which came out of the realization that you shouldn't have to make a phone call back to California to get a video. Others already have the data, but you just aren't talking to them. Instead, with BitTorrent, people can register hashes of the data they have and others can get it locally, which is much more efficient. At one point, BitTorrent traffic was half of all internet traffic.
There are other projects that use hashes to identify data, including ZFS, Bitcoin, and Tahoe-LAFS. They are all based on Merkle trees, which are trees of hashes. We have "reasonably safe" cryptographic hashes, Krekel said, which can be used to hash data blocks; those hashes can be hashed to identify files, directories can be identified by the hash of their file hashes, and so on. He wryly pointed out that this Merkle is not the same as the Chancellor of Germany (Angela Merkel); "I totally disagree with her politics", he said to applause.
Immutability
Merkle trees are an immutable (unchangeable) data structure. If you change one of the data blocks, all of the hashes in the path to the root of the tree must change, including the root. But that root hash uniquely identifies the whole tree and any corruption of data during transfer can be easily detected by simply verifying the hashes.
Immutability of data structures is also a property of some programming languages. In nearly every language that has been created or become popular over the last ten years, immutability is a key feature of the language. It helps with scalability by allowing parallel operations. In addition, programming with immutable data structures is safer. There is a project called Pyrsistent that provides immutable dictionaries, sets, and such to Python, which allows experimenting with immutability.
Krekel then turned to the last entry in "The Zen of Python":
"Namespaces are one honking great idea -- let's do more of
those!
". He noted that he loved the introspection features of
Python and that it was "namespaces all the way down". Classes are just
dictionaries, as are objects and modules, and all of that can be inspected
programmatically. Creating his own implementation of that was part of his
motivation for co-founding PyPy.
In thinking of "more of those", he has come up with a nascent idea about "immutable namespaces" for Python. Having a reference to the namespace would mean that nothing it referred to could ever be changed—it would be like a Git commit of the contained namespaces. It is worthwhile to see how this might be beneficial and could be a step toward removing the global interpreter lock (GIL) from Python. It is a "vague idea", but even if it doesn't work out, thinking about immutable data, perhaps combined with namespaces, is something that will make programs easier to reason about.
IPFS
A new peer-to-peer protocol, the InterPlanetary File System or IPFS, was next up. Instead of location-based addressing using names (like http://lwn.net/...), IPFS uses content-based addressing (ipfs://<hash>/...). So instead of asking to connect to a phone number, users ask for a particular piece of content wherever it is stored. They don't need to verify the sender of the data since they can validate the hash of the content returned.
But hash values are even harder to remember than domain names (or phone numbers), so there needs to be another layer that maps names to hashes. The current scheme uses mutable hashes stored as TXT records in the DNS that map to the actual immutable hash of the content. Mutable hashes are used so that the content can change without requiring an update to the DNS entry for a given domain. That scheme is called IPNS (which doesn't seem to have a web page) and is based on the naming used by the Self-certifying File System (SFS).
IPFS is a work in progress, but it can be used today. It currently uses IP and DNS, but can operate using other protocols when they become available. For example, Namecoin might be used instead of DNS someday. Data in IPFS is exchanged using a BitTorrent-like mechanism and routing is handled using the distributed hash table (DHT) or with Multicast DNS (mDNS) for purely local transfers.
There is still an issue about how to bootstrap a list of DHT nodes. If you think about the offline-first scenario, where devices are not connected for days or weeks, there will be changes in which IP addresses are participating in the DHT. Peer-to-peer networks solve the problem by having stable nodes that are always available, but that is not a decentralized solution.
He pointed to a blog post by Adam Ierymenko that talks about the problem. In it, Ierymenko suggested the idea of a "blind idiot god" for the internet. It would be a minimal centralized resource that could be used to solve the bootstrapping issue, but the key is that it would need to do so without being able to see much of the information it was handling—provably. It is a tall order and there is an open debate on how to do it, Krekel said.
Back to rocket science
Instead of blaming Google and Facebook, which have provided great services, released open-source software, and given good jobs to many in the industry, he said, we should just replace them with something decentralized. He quoted Buckminster Fuller ("You never change things by fighting the existing reality. To change something, build a new model that makes the existing mode obsolete.") and agreed with that sentiment. We should just build something better, he said.
There is still a lot of innovation going on, but the pace seems to have slowed since the 1970s. That is borne out by the number of scientific publications that he mentioned early in the talk, he said. A book by David Graeber called The Utopia of Rules has lots of research that shows that the rate of innovation has "kind of leveled off". If you look at the changes from 1910 to 1960 or 1970 and compare that with then until now, many of things that were expected have not arrived. We set the speed record for humans and haven't surpassed it (or done much in the way of a real space program) since.
This and other examples contradict the idea that we are innovating exponentially and making huge technical advances—doing rocket science, essentially. Progress is being made in specific areas, we have more and more ways to scale up to million-to-one architectures, for example, but it tends to be focused on monetization, rather than on basic research for things like the space program.
There may be things going on today that do hearken back to the 1970s, though. Back then, a few people created the internet protocol and changed the world in a fundamental way. That might be happening again with things like IPFS. While IPFS may not be successful, or even the right solution, he thinks that right solution will be something decentralized and look similar to where IPFS is headed.
| Index entries for this article | |
|---|---|
| Security | Encryption/Network |
| Security | Privacy |
| Conference | EuroPython/2015 |