Journal tags: urls

28

Of Time and the Network and the Long Bet

When I went to Webstock, I prepared a new presentation called Of Time And The Network:

Our perception and measurement of time has changed as our civilisation has evolved. That change has been driven by networks, from trade routes to the internet.

I was pretty happy with how it turned out. It was a 40 minute talk that was pretty evenly split between the past and the future. The first 20 minutes spanned from 5,000 years ago to the present day. The second 20 minutes looked towards the future, first in years, then decades, and eventually in millennia. I was channeling my inner James Burke for the first half and my inner Jason Scott for the second half, when I went off on a digital preservation rant.

You can watch the video and I had the talk transcribed so you can read the whole thing.

It’s also on Huffduffer, if you’d rather listen to it.

During the talk, I pointed to my prediction on the Long Bets site:

The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years.

I made the prediction on February 22nd last year (a terrible day for New Zealand). The prediction will reach fruition on 02022-02-22 …I quite like the alliteration of that date.

Here’s how I justified the prediction:

“Cool URIs don’t change” wrote Tim Berners-Lee in 01999, but link rot is the entropy of the web. The probability of a web document surviving in its original location decreases greatly over time. I suspect that even a relatively short time period (eleven years) is too long for a resource to survive.

Well, during his excellent Webstock talk Matt announced that he would accept the challenge. He writes:

Though much of the web is ephemeral in nature, now that we have surpassed the 20 year mark since the web was created and gone through several booms and busts, technology and strategies have matured to the point where keeping a site going with a stable URI system is within reach of anyone with moderate technological knowledge.

The prediction has now officially been added to the list of bets.

We’re playing for $1000. If I win, that money goes to the Bletchley Park Trust. If Matt wins, it goes to The Internet Archive.

The sysadmin for the Long Bets site is watching this bet with great interest. I am, of course, treating this bet in much the same way that Paul Gilster is treating this optimistic prediction about interstellar travel: I would love to be proved wrong.

The detailed terms of the bet have been set as follows:

On February 22nd, 2022 from 00:01 UTC until 23:59 UTC,
entering the characters http://www.longbets.org/601 into the address bar of a web browser or command line tool (like curl)
OR
using a web browser to follow a hyperlink that points to http://www.longbets.org/601
MUST
return an HTML document that still contains the following text:
“The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years.”

The suspense is killing me!

A matter of protocol

The web is made of sugar, spice and all things nice. On closer inspection, this is what most URLs on the web are made of:

protocol://domain/path

The protocol—e.g. http—followed by a colon and two slashes (for which Sir Tim apologises).
The domain—e.g. adactio.com or huffduffer.com.
The path—e.g. /journal/tags/nerdiness or /js/global.js.

(I’m leaving out the whole messy business of port numbers—which can be appended to the domain with a colon—because just about everything on the web is served over the default port 80.)

Most URLs on the web are either written in full as absolute URLs:

a href="http://adactio.com/journal/tags/nerdiness"
script src="https://huffduffer.com/js/global.js"

Or else they’re written out relative to the domain, like this:

a href="/journal/tags/nerdiness"
script src="/js/global.js"

It turns out that URLs can not only be written relative to the linking document’s domain, but they can also be written relative to the linking document’s protocol:

a href="//adactio.com/journal/tags/nerdiness"
script src="//huffduffer.com/js/global.js"

If the linking document is being served over HTTP, then those URLs will point to http://adactio.com/journal/tags/nerdiness and https://huffduffer.com/js/global.js but if the linking document is being served over HTTP Secure, the URLs resolve to https://adactio.com/journal/tags/nerdiness and https://huffduffer.com/js/global.js.

Writing the src attribute relative to the linking document’s protocol is something that Remy is already doing with his HTML5 shim:

<!--[if lt IE 9]>
<script src="//html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

If you have a site that is served over both HTTP and HTTPS, and you’re linking to a CDN-hosted JavaScript library—something I highly recommend—then you should probably get in the habit of writing protocol-relative URLs:

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js">
</script>

This is something that HTML5 Boilerplate does by default. HTML5 Boilerplate really is a great collection of fantastically useful tips and tricks …all wrapped in a terrible, terrible name.

Hashcloud

Hashbangs. Yes, again. This is important, dammit!

When the topic first surfaced, prompted by Mike’s post on the subject, there was a lot of discussion. For a great impartial round-up, I highly recommend two posts by James Aylett:

There seems to be a general concensus that hashbang URLs are bad. Even those defending the practice portray them as a necessary evil. That is, once a better solution is viable—like the HTML5 History API—then there will no longer be any need for #! in URLs. I’m certain that it’s a matter of when, not if Twitter switches over.

But even then, that won’t be the end of the story.

Dan Webb has written a superb long-zoom view on the danger that hashbangs pose to the web:

There’s no such thing as a temporary fix when it comes to URLs. If you introduce a change to your URL scheme you are stuck with it for the forseeable future. You may internally change your links to fit your new URL scheme but you have no control over the rest of the web that links to your content.

Therein lies the rub. Even if—nay when—Twitter switch over to proper URLs, there will still be many, many blog posts and other documents linking to individual tweets …and each of those links will contain #!. That means that Twitter must make sure that their home page maintains a client-side routing mechanism for inbound hashbang links (remember, the server sees nothing after the # character—the only way to maintain these redirects is with JavaScript).

As Paul put it in such a wonderfully pictorial way, the web is agreement. Hacks like hashbang URLs—and URL shorteners—weaken that agreement.

The long prep

The secret to a good war movie is not in the depiction of battle, but in the depiction of the preparation for battle. Whether the fight will be for Agincourt, Rourke’s Drift, Helm’s Deep or Hoth, it’s the build-up that draws you in and makes you care about the outcome of the upcoming struggle.

That’s what 2011 has felt like for me so far. I’m about to embark on a series of presentations and workshops in far-flung locations, and I’ve spent the first seven weeks of the year donning my armour and sharpening my rhetorical sword (so to speak). I’ll be talking about HTML5, responsive design, cultural preservation and one web; subjects that are firmly connected in my mind.

It all kicks off in Belgium. I’ll be taking a train that will go under the sea to get me to Ghent, location of the Phare conference. There I’ll be giving a talk called All Our Yesterdays.

This will be non-technical talk, and I’ve been given carte blanche to get as high-falutin’ and pretentious as I like …though I don’t think it’ll be on quite the same level as my magnum opus from dConstruct 2008, The System Of The World.

Having spent the past month researching and preparing this talk, I’m looking forward to delivering it to a captive audience. I submitted the talk for consideration to South by Southwest also, but it was rejected so the presentation in Ghent will be a one-off. The SXSW rejection may have been because I didn’t whore myself out on Twitter asking for votes, or it may have been because I didn’t title the talk All Our Yesterdays: Ten Ways to Market Your Social Media App Through Digital Preservation.

Talking about the digital memory hole and the fragility of URLs is a permanently-relevant topic, but it seems particularly pertinent given the recent moves by the BBC. But I don’t want to just focus on what’s happening right now—I want to offer a long-zoom perspective on the web’s potential as a long-term storage medium.

To that end, I’ve put my money where my mouth is—$50 worth so far—and placed the following prediction on the Long Bets website:

The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years.

If you have faith in the Long Now foundation’s commitment to its URLs, you can challenge my prediction. We shall then agree the terms of the bet. Then, on February 22nd 2022, the charity nominated by the winner will receive the winnings. The minimum bet is $200.

If I win, it will be a pyrrhic victory, confirming my pessimistic assessment.

If I lose, my faith in the potential longevity of URLs will be somewhat restored.

Depending on whether you see the glass as half full or half empty, this means I’m either entering a win/win or lose/lose situation.

Care to place a wager?

Going Postel

I wrote a little while back about my feelings on hash-bang URLs:

I feel so disappointed and sad when I see previously-robust URLs swapped out for the fragile #! fragment identifiers. I find it hard to articulate my sadness…

Fortunately, Mike Davies is more articulate than I. He’s written a detailed account of breaking the web with hash-bangs.

It would appear that hash-bang usage is on the rise, despite the fact that it was never intended as a long-term solution. Instead, the pattern (or anti-pattern) was intended as a last resort for crawling Ajax-obfuscated content:

So the #! URL syntax was especially geared for sites that got the fundamental web development best practices horribly wrong, and gave them a lifeline to getting their content seen by Googlebot.

Mike goes into detail on the Gawker outage that was a direct result of its “sites” being little more than single pages that require JavaScript to access anything.

I’m always surprised when I come across as site that deliberately chooses to make its content harder to access.

Though it may not seem like it at times, we’re actually in a pretty great position when it comes to front-end development on the web. As long as we use progressive enhancement, the front-end stack of HTML, CSS, and JavaScript is remarkably resilient. Remove JavaScript and some behavioural enhancements will no longer function, but everything will still be addressable and accessible. Remove CSS and your lovely visual design will evaporate, but your content will still be addressable and accessible. There aren’t many other platforms that can offer such a robust level of loose coupling.

This is no accident. The web stack is rooted in Postel’s law. If you serve an HTML document to a browser, and that document contains some tags or attributes that the browser doesn’t understand, the browser will simply ignore them and render the document as best it can. If you supply a style sheet that contains a selector or rule that the browser doesn’t recognise, it will simply pass it over and continue rendering.

In fact, the most brittle part of the stack is JavaScript. While it’s far looser and more forgiving than many other programming languages, it’s still a programming language and that means that a simple typo could potentially cause an entire script to fail in a browser.

That’s why I’m so surprised that any front-end engineer would knowingly choose to swap out a solid declarative foundation like HTML for a more brittle scripting language. Or, as Simon put it:

Gizmodo launches redesign, is no longer a website (try visiting with JS disabled): http://gizmodo.com/

Read Mike’s article, re-read this article on URL design and listen to what John Resig has to say in this interview .

Tagdiving

Speaking of URLs…

We were having a discussion in the Clearleft office recently about that perennially-tricky navigation pivot: tags. Specifically, we were discussing how to represent the interface for combinatorial tags i.e. displaying results of items that have been tagged with tag A and tag B.

I realised that this was functionality that I wasn’t even offering on Huffduffer, so I set to work on implementing it. I decided to dodge the interface question completely by only offering this functionality through the browser address bar. As a fairly niche, power-user feature, I’m not sure it warrants valuable interface real estate—though I may revisit that challenge later.

I can’t use the + symbol as a tag separator because Huffduffer allows spaces in tags (and spaces are converted to pluses in URLs), so I’ve settled on commas instead.

For example, there are plenty of items tagged with “music” (/tags/music) and plenty of items tagged with “science” (/tags/science) but there’s only a handful of items tagged with both “music” and “science” (/tags/music,science).

This being Huffduffer, where just about every page has corresponding JSON, RSS and Atom representations, you can also subscribe to the podcast of everything tagged with both “music” and “science” (/tags/music,science/rss).

There’s an OR operator as well; the vertical pipe symbol. You can view the 60 items tagged with “html5”, the 14 items tagged with “css3”, or the 66 items tagged with either “html5” or “css3” (/tags/html5|css3).

Wait a minute …66 items? But 60 plus 14 equals 74, not 66!

The discrepancy can be explained by the 8 items tagged with both “css3” and “html5” (/tags/html5,css3).

The AND and OR operators can be combined, so you can find items tagged with either “science” or “religion” that are also tagged with “politics” (/tags/science|religion,politics).

While it’s fun to do this in the browser’s address bar, I think the real power is in the way that the corresponding podcast allows you to subscribe to precisely-tailored content. Find just the right combination of tags, click on the RSS link, and you’re basically telling iTunes to automatically download audio whenever there’s something new that matches criteria like:

everything that is tagged with either “science fiction” or “fantasy” or “horror” and is also tagged with “romance” (/tags/science+fiction|fantasy|horror,romance),
everything that is tagged with “google” and is also tagged with either “apple” or “microsoft” or “yahoo” (/tags/google,apple|microsoft|yahoo).

I’m sure there are plenty of intriguing combinations out there. Now I can use Huffduffer’s URLs to go spelunking for audio gems at the most promising intersections of tags.

The URI is the thing

Here’s what’s on my desk at work: an iMac (with keyboard, mouse and USB cup warmer), some paper, pens, a few books and an A4-sized copy of Paul Downey’s The URI Is The Thing—an intricately-detailed Boschian map of all things RESTful. It’s released under a Creative Commons license, so feel free to download the PDF from archive.org, print it out and keep it on your own desk.

I love good URL design. I found myself nodding vigorously in agreement with just about every point in this great piece on URL design:

URLs are universal. They work in Firefox, Chrome, Safari, Internet Explorer, cURL, wget, your iPhone, Android and even written down on sticky notes. They are the one universal syntax of the web. Don’t take that for granted.

That’s why I feel so disappointed and sad when I see previously-robust URLs swapped out for the fragile #! fragment identifiers. I find it hard to articulate my sadness, but it’s related to what Ben said in his comment to Nicholas’s article on how many users have JavaScript disabled:

The truth is that if site content doesn’t load through curl it’s broken.

Or, as Simon put it:

The Web for me is still URLs and HTML. I don’t want a Web which can only be understood by running a JavaScript interpreter against it.

If I copy and paste the URL of that tweet, I get http://twitter.com/#!/simonw/status/25696723761 …which requires a JavaScript interpreter to resolve.

Fortunately, those fragile Twitter URLs will be replaced with proper robust identifiers if this demo by Twitter engineer Ben Cherry is anything to go by. It’s an illustration of saner HTML5 history management using the history.pushState method.

Beautiful hackery

While I had Matthew in my clutches, I made him show me around the API for They Work For You. Who knew that so much could fun be derived from data about MPs?

First off, there’s Matthew’s game of MP Top Trumps, ‘though he had to call it MP Fab Farts to avoid getting a cease and desist letter.

Then there’s a text adventure built on the API. This is so good! Enter your postcode and you find yourself playing the part of your parliamentary representative with zero experience points and one hundred hit points. You must work your way across the country, doing battle with rival MPs, as you make your way towards Sedgefield, the lair of Blair.

You can play a Web version but for some real old-school fun, try the telnet version. This reminded me of how much I used to love text adventures back in the days of 8-bit computers. I even remember trying to write my own in BASIC.

For what it’s worth, Celia Barlow, MP for Hove, has excellent pesteredness points. I made it all the way up to Sedgefield and defeated Tony Blair in battle. My prize was the source code of the adventure game in Python.

Ah, what larks!

There’s another project that Matthew works on that I find extremely useful. He has created accessible UK train timetables using the data from the National Rail site, a scrAPI if you will. This is where I go whenever I need to plan a train journey.

The latest feature is something that warms the cockles of my heart: beautiful, hackable URLs. If I want a list of trains going from Brighton to London, I can just type:

http://traintimes.org.uk/brighton/london

It handles spaces (or pluses or underscores) too:

http://traintimes.org.uk/brighton/london victoria

The URL can also be extended with a departure time:

http://traintimes.org.uk/brighton/london victoria/14:00

My address bar is my command line. This is the kind of design that makes URL fetishists like Tom very happy.