[changelog] “Find my location” button

I just added a button to the Explore Map and the pop-up map you see when geotagging your own photos from the photo page (organizer support v.soon).

Using my l33t Skitch skills I’ll attempt to highlight it …

Getting ready for a code blog post

… but WAIT! You may not see it! It’s one of those “Power-User” type things…

To get the button to show up you’ll need some form of geo-locating built-in/plug-in type thing, or maybe you’re all smarty-pants and running a cutting edge beta version of a browser with location finding built in already. Perhaps you’ve already installed Google Gears, in which case we’ll use that.

Probably the easiest way of getting the button to appear is to pop over to the Loki site and click the “Try it Now” button, install the plug-in, then pop back to Flickr. Loki is from the SKYHOOK Wireless peeps, who all the cool kids seem to be using.

You can also click over to the Mozilla Labs and read more about their Geode project, about how location stuff will soon be built into browsers and everything and install their geode plug-in from there.

Either way, it’ll check all three “things” and show the button if it finds one, as more options come along I’ll add those too.

As an aside …

This is why you shouldn’t do graphical buttons and multi-language support at the same time.

This is why you shouldn't do graphical buttons and multi-language support at the same time.

… nightmare!

I See Smart People, AKA: We Do Stuff …

Over on our sister photo arty blog you could easily imagine reading phrases like “One of the amazing things about working at Flickr is the vast amount of incredible photography it exposes you to“, or some such. Hah! Those arty types!

Over here, I’d like to post the flip side … about how one of the amazing things about working at Flickr, is the awesome people I get to work with.

Huddle

Take for example …

Ross Harmes

… people often think I’m joking when we’re sitting in a meeting, discussing how we should standardise our front-end coding conventions or some such and I say we should just “ask Ross”.

But! BUT!! but, Ross wrote a fricking book about JavaScript; Pro JavaScript Design Patterns and sits 3 desks away, it’s faster (and more amusing (to me)) to shout out a question than it is to flick through the index of the book. It’s like having the talking Kindle version, but with a much more natural voice!

You can also find out more at jsdesignpatterns.com

Anyway …

The reason why I wanted to post this, is that recently an awful lot of my co-workers have been doing stuff! So with an eye to that …

The Lovely Kellan Elliott-McCrea

Flashing quickly across the radar last couple of weeks is/was a bunch of discussion about URL shorteners, sort of starting with Josh over here: on url shorteners, and David with The Security Implications of URL Shortening Services. With other discussions popping up here and there and sorta everywhere.

And so, with this bit of philosophy from “Stinky” Willison (more on Simon later) …

Twitter / kellan: “You can build prototypes ...

… Kellan built this (maybe even during meetings); RevCanonical: url shortening that doesn’t hurt the internet, you can read more about it in his blog post URL Shortening Hinting [Note: includes mention of Flickr], where the comments are worth price of admission alone, also, features hamster photos.

If you want to join in, you can read more over on the official RevCanonical blog and get, fork or whatever it is people do with code on github. And I’m sure we’ll have more news about RevCanonical here soon :)

But not content with starting that wildfire, Kellan has also been does his bit to help OpenStreetMap, by walking around with a GPS unit and, I think this part is important, drinking beer

Neocartography

… shown here featuring our good friend Mikel Maron, remember we use OpenSteetMap on Flickr when our own maps are a little sparse. More on maps later!

Meanwhile …

The Mighty John Allspaw

Mr Allspaw is our wonderful Ops guy, here he is …

I have a working iSight now.

… smoldering.

As well as smoldering he also gave a talk at the Web 2.0 Expo called Operational Efficiency Hacks the other week. If you’re into that type of thing and missed it, which you probably did, here are his slides

… and his follow up post adds a little further reading. If you like this kinda of thing you should probably subscribe to his blog where he posts really interesting Flickr related stuff, and infuriatingly enough *not* here on this blog, /me sulk.

On the subject of Allspaw (and as we’ve already mentioned one book), I was pretty sure I’d mentioned his book before: The Art of Capacity Planning: Scaling Web Resources … if building big things on the web is your kinda thing, but apparently I haven’t, so …

According to a reviewer on Amazon “John’s examples are just like Charlie’s from the TV show Numb3rs”, having never watched Numb3rs I can only assume that’s a probably a bad thing (kinda like Scully from X-Files explaining science) but gave it 4 stars anyway :) on those grounds alone you should buy it …

The Art of Capacity Planning: Scaling Web Resources

Oh and don’t forget the WebOps Visualizations Pool on flickr, that John often posts to when things suddenly get much better or worse ;) if you enjoy graphs like this …

The Obama Drop

Getting back to the front-end for a second …

Scott Schiller, Fish enthusiast!

You’ve heard of Muxtape right? That website that plays MP3s, how’d you think it does that? Or the cool javascript/audio stuff that Jacob Seidelin does over at Nihilogic; JavaScript + Canvas + SM2 + MilkDrop = JuicyDrop & Music visualization with Canvas and SoundManager2?

Screenshot shown here (warning: link to noisy thing) …

A [ Radiohead / JavaScript / Boids / Canvas / SM2 ] mashup by Jacob Seidelin

Well they both use SoundManager 2 written by our very own Scott Schiller.

It’s an extensive and easy to use javascript thingy … wait, Scott says it better … “SoundManager 2 wraps and extends the Flash Sound API, and exposes it to Javascript. (The flash portion is hidden, transparent to both developers and end users.)” … which basically means that if you like JavaScript, messing with audio but hate, I mean, dislike working with Flash, it can save you a lot of pain. Here’s one of the demos Scott put together …

And if you think that all looks awesome, remember that Scott is one of our fantastic front-end guys, bringing all that good js magic to Flickr! Apart from the music part, well unless we one day decide to add music and customisable backgrounds to flickr [1].

Aaron Straup Cope

Aaron covered this only the other day: The Only Question Left Is, but has been doing an awful lot with generating shapefiles recently. I just wanted to add my take to it, because even I have trouble keeping up.

Basically what I want, is to be able to send something-somewhere a list of latitudes and longitudes and it return me the “shape” that those points make. This could be anything, the locations of geotagged Squirrels, or even something useful, well kinda like this from Matt Biddulph (him wot of Dopplr) …

London dopplr places, filtered to only places my social network has been to, clustrd
London dopplr places, filtered to only places my social network has been to, clustrd

… which maps out something of interest to him, where his “social network” go/eat etc. in London. Which may be different from mine, or could even have some overlap, thus answering the time old question; which pub should we all go to for lunch?

It’s not quite at the point where you can do it without having to put a little effort in, but I keep prodding Aaron because I want it now! But if you’re the type that does enjoy putting the effort in then you can again do the GitHub dance here: ws-clustr and py-wsclustr (Python bindings for spinning up and using an EC2 instance running ws-clustr).

Once more, more on maps later.

Daniel Bogan – Setup Man

Bogan is virtual, and only exists in the internets, as can be seen here …

Denied

… kind of like Max Headroom, but with worse resolution. Which I think makes Flickr the first interweb company you have a real AI working on the code, not that pretend AI stuff!

Last year he has a bash at helping to put a little context around us delicate flower developers, with a quick run-down of the setup we each used with Trickr, or Humanising the Developers (Part 1) & Trickr, or Humanising the Developers (Part 2). Based on an old project of his called “The Setup” where he used to interview various Internet Famous people (when there wasn’t so damn many of them) about their Setup.

Recently he’s reprised that task, with, wait for it … The Setup, where our very own Bogan asks such leading lights as John Gruber, Steph Thirion, Jonathan Coulton & Gina Trapani. I have no idea how he finds the time!

In turn you can read a quick interview with Bogan over at indicommons.

Rev Dan Catt — errr, me!

Meanwhile, if you’re reading this blog, you’ve probably already seen this, I’ve been trying my hand at using Processing to visualize 24 hours of geotagged photos on Flickr …

… which I managed by following the instructions here Processing, JSON & The New York Times to get Processing to consume our very own Flickr API in JSON format. Which in turn started me off prodding at the Processing group

Flickr: The Processing.org Pool

and Visualization groups on Flickr …

Flickr: The Cool Data Visualization Techniques - Information Visualization Pool

Pulling it all together

So that’s what some of us are up-to, and going back to the start, I’m amazed and all the stuff that goes on, brilliant minds and all that.

In my head, this is what ties it all together, hang on, here we go …

Kellan’s been walking around with a GPS unit (along with 1000s of others), adding to the OpenStreetMap (OSM) dataset, we (Flickr) sometimes use that dataset, but also … Matt Jones (also him wot of Dopplr) made this …

My first Cloudmade map style: "Lynchian_Mid"

… using Cloudmade, who in turn use OSM data to allow people to easily style up and use maps. Now, I’m sure Mr Jones, wont mind me saying that he’s not a coder, infact here he is; Matt Jones – Design Husband …

Matt Jones - design husband

… but a non-programmer can now easily make maps, as demonstrated above and described here: My first Cloudmade map style: “Lynchian_Mid”.

Then using our wonderfully public Shape Files API (flickr.places.getShapeHistory which yes you can get in JSON format for using with Processing or JavaScript) overlay boundaries. Even, when it’s easier (for the non-programmer) plot outlines and shapes, based on the code Aaron is working on, onto those maps.

More on making maps here: maps from scratch.

But where could you get such useful data for plotting or visualising, well obviously there’s our API, which is where the senseable team at MIT got the data for their Los ojos del mundo (the world’s eyes) project, again using Processing …


(un)photographed Spain from senseablecity on Vimeo.

But also, let us recall “Stinky” Willison, one time employee of Flickr, who now works at The Gruadian. They have a geocoding project, that allows you, if you so wished to place their stories on a map … http://guardian.apimaps.org/search.html … which uses Mapping from Cloudmade, map data from OpenStreetMap, location search from our very own API, and stories from their own API. Which in turn allows you to plot their stories on your own maps, phew!

More about the Guardian Open Platform.

The Guardian Open Platform | guardian.co.uk

You can also read about their Data Store, which gives you access to a load of easy to use data just ripe for visualizing…

… be that with Processing, Flash or JavaScript (following the advice in Ross’s book), and even with photos from Flickr and Audio driven with Scott’s SoundManager2, in “Shapes” powered by Aaron, and preserved with short URLs that’ll stick around, thanks Kellan :) and you can scale it if you need to by following John’s insights.

And that’s just what we do when we’re not working on Flickr.

Photos by Ross Harmes, Kellan, jspaw, jesse robbins, Matt Biddulph, waferbaby, moleitau and dan taylor.


[1]never going to happen.

The Only Question Left Is

photo by Shawn Allen

At the Emerging Technology conference this year Stamen Design’s Michal Migurski and Shawn Allen led an afternoon workshop called “Maps from Scratch: Online Maps from the Ground Up” where people made digital maps from, well… scratch.

If you’ve never heard of Stamen they’ve been doing some of the most exciting work around the idea of “custom cartography” including: Cabspotting, Oakland Crimespotting and Old Oakland Maps, work for the London Olympics, and designing custom map tiles for CloudMade. (Stamen also built the recently launched Flickr Clock :-)

All of this is interesting in its own right; proof that there is still a lot of room in which to imagine maps beyond so-called red-dot fever. All of this is extra interesting in light of Apple’s recent announcement to allow developers to define their own map tiles in the next iPhone OS release. All of this super-duper interesting because it is work produced by a team of less than 10 people.

The tools, and increasingly the data, to build the maps we want are bubbling up and becoming easier and more accessible to more people every day. Easier, anyway.

“One of the things that made this tutorial especially interesting for us was our use of Amazon’s EC2 service, the “Elastic Compute Cloud” that provides billed-by-the-hour virtual servers with speedy internet connections and a wide variety of operating system and configuration options. Each participant received a login to a freshly-made EC2 instance (a single server) with code and lesson data already in-place. We walked through the five stages of the tutorial with the group coding along and making their own maps, starting from incomplete initial files and progressing through added layers of complexity.

“Probably the biggest hassle with open source geospatial software is getting the full stack installed and set up, so we’ve gone ahead and made the AMI (Amazon Machine Image, a template for a virtual server) available publicly for anyone to use, along with notes on the process we used to create it.”

Michal Migurski

The Maps From Scratch (MFS) AMI may not be a Leveraged Turn Key Synergistic Do-What-I-Mean Solutions Platform but, really, anything that dulls the hassle and cost of setting up specialized software is a great big step in the right direction. I mention all of this because Clustr, the command-line application we use to derive shapefiles from geotagged photos, has recently been added to the list of tools bundled with the MFS AMI.

Specifically: ami-4d769124.

We’re super excited about this because it means that Clustr is that much easier for people to use. We expressly chose to make Clustr an open-source project to share some of the tools we’ve developed with the community but it has also always had a relatively high barrier to entry. Building and configuring a Unix machine is often more that most people are interested in, let alone compiling big and complicated maths libraries from scratch. Clustr on EC2 is not a magic pony factory but hopefully it will make the application a little friendlier.

Shapes

Creating and configuring an EC2 account is too involved for this post but there are lots of good resources out there, starting with Amazon’s own documentation. When I’m stuck I usually refer back to Paul Stamatiou’s How To: Getting Started with Amazon EC2.

Assuming that you familiar using Unix command line tools, let’s also assume that you have gotten all your ducks in a row and are ready to fire up the MFS AMI:

your-computer> ec2-run-instances ami-4d769124 -k example-keypair

your-computer> ec2-describe-instances

At which point, you’ll see something like this:

INSTANCE i-xxxxxxxx ami-4d769124 ec2-xxxxx.amazonaws.com blah blah blah

i-xxxxxxxx is the unique identifier of your current EC2 session. You will need this to tell Amazon to shut down the server and stop billing you for its use.

ec2-xxxxx.amazonaws.com is the address of your EC2 server on the Internets.

Once you have that information, you can start using Clustr. First, log in and create a new folder where you’ll save your shapefile:

your-computer> ssh -i example-rsa-key root@ec2-xxxxx.amazonaws.com

ec2-xxxxx.amazonaws.com> mkdir /root/clustr-test

The MFS AMI comes complete with a series of sample “points” files to render. We’ll start with the list of all the geotagged photos uploaded to Flickr on March 24:

ec2-xxxxx.amazonaws.com> /usr/bin/clustr -v -a 0.001 
   /root/clustr/start/points-2009-03-24.txt 
   /root/clustr-test/clustr-test.shp

By default Clustr generates a series of files named clustr (dot shp, dot dbf and dot shx because shapefiles are funny that way) in the current working directory. You can specify an alternate name by passing a fully qualified path as the last argument to Clustr. When run in verbose mode (that’s the -v flag) you’ll see something like this:

Reading points from input.
Got 44410 points for tag '20090324'.
799 component(s) found for alpha value 0.001.
- 23 vertices, area: 86.7491, perimeter: 71.9647
- 32 vertices, area: 1171.51, perimeter: 41.3095
- 8 vertices, area: 18.5112, perimeter: 0.529504
- 12 vertices, area: 1484.81, perimeter: 10.8544
...
Writing 505 polygons to shapefile.

Yay!

ec2-xxxxx.amazonaws.com> ls -la /root/clustr-test
total 172
drwxr-xr-x 2 root root  4096 2009-04-07 03:14 .
drwxr-xr-x 5 root root  4096 2009-04-07 02:22 ..
-rw-r--r-- 1 root root 52208 2009-04-07 03:14 clustr-test.dbf
-rw-r--r-- 1 root root 97388 2009-04-07 03:14 clustr-test.shp
-rw-r--r-- 1 root root  4140 2009-04-07 03:14 clustr-test.shx

Now copy the shapefiles back to your computer and terminate your EC2 instance (or you might be surprised when you get your next billing statement from Amazon).

ec2-xxxxx.amazonaws.com> scp -r /root/clustr-test 
   you@your-computer:/path/to/your/desktop/

ec2-xxxxx.amazonaws.com> exit

your-computer> ec2-terminate-instances i-xxxxxxxxx

I created this image (using the open source QGIS application) for all those points by running Clustr multiple times with alpha numbers ranging from 0.05 to 603:

SHAPEZ (2009-03-24)

Here’s another version rendered using the nik2img application and a custom style sheet, both included with the MFS distribution:

clustr

Here’s one of all the geotagged photos tagged “route66” (with alpha numbers ranging from 0.001 to 0.5):

tag=route66, alpha=(0.001 - 0.5)

Apologies and big sloppy kisses to Stamen’s own Mappr (first released in 2005).

Or tagged “caltrain“, the commuter train that runs between San Francisco and San Jose:

tag=caltrain, alpha=0.001

Meanwhile, Matt Biddulph at Dopplr has been generating a series of visualizations depicting the shape of where to eat, stay and explore for the cities in their Places database. This is what London looks like:

Or: “London dopplr places, filtered to only places my social network has been to, clustrd“.

One of the things I like the most about Clustr is that it will generate shape(file)s for any old list of geographic coordinates. Now that most of the hassle of setting up Clustr has been (mostly) removed, the only question left is: What do you want to render?

“They do not detail locations in space but histories of movement that constitute space.”

Rob Kitchin, Chris Perkins

If you’re like me you’re probably thinking something like “Wouldn’t it be nice if I could just POST a points file to a webservice running on the AMI and have it return a compressed shapefile?” It sure would so I wrote a quick and dirty version (not included in the MFS AMI; you’ll need to do that yourself) in PHP but if there are any Apache hackers in the house who want to make a zippy C version that would be even Moar Awesome ™.

If you don’t want to use the MFS AMI and would rather just install Clustr on your own machine instance, here are the steps I went through to get it work on a Debian 5.0 (Lenny) AMI; presumably the steps are basically the same for any Linux flavoured operating system:

$> apt-get update
$> apt-get install libcgal-dev
$> apt-get install libgdal1-dev
$> apt-get install subversion

$> svn co http://code.flickr.com/svn/trunk/clustr/
$> cd clustr
$> make
$> cp clustr /usr/bin/

$> clustr -h

clustr 0.2 - construct polygons from tagged points
written by Schuyler Erle

(c) 2007-2008 Yahoo!, Inc.

Usage: clustr [-a <n>] [-p] [-v] <input> <output>
   -h, -?      this help message
   -v          be verbose (default: off)
   -a <n>      set alpha value (default: use "optimal" value)
   -p          output points to shapefile, instead of polygons

If <input> is missing or given as "-", stdin is used.
If <output> is missing, output is written to clustr.shp.
Input file should be formatted as: <tag> <lon> <lat>n
Tags must not contain spaces.
        

Just like that!

photo by Timo Arnall

How the contact cache was won

You say ‘cash’, I say ‘kaysh’

Flickr has a lot of users. A lot. And most of those users have contacts, family, friends; somewhere between none and a bajillion. Or tens of thousands, anyhow. That’s a lot of relationships flying hither and yon, meaning we can’t just cache this stuff on the fly whenever the need strikes us. And strike it did.

Thus, Bo Selecta.

This project was designed to grab up a person’s contacts from anywhere in Flickrspace, and it had to be usable in bits of the site we hadn’t even designed yet. But it also had to not suck, and it had to be fast.

Supafast.

Luckily for us, we have at our disposal a shipping crate in the basement full of terribly clever little robots wearing suitable, fleshy attire and having names like Ross and Paul and Cal.

Walking into the river

As Rossbot has already covered, we spent a lot of time back-and-forthing on how we’d seed this aggregated cache all over the damned place without compromising on speed or our own general sexual attractiveness. Plus, I just wanted to use big words like ‘aggregated’ and ‘seed’.

As I’ve already mentioned above, making this magic happen at request time was not an option, so we turned to our (somewhat) trusty offline tasks system. These tasks munge and purge and generally do all sorts of wonderful data manipulation on boxes separate to the main site, in a generally orderly fashion, and do it in the background.

Offline tasks do it in the background

First up, we needed to work out what data we’d actually want to cache, which ended up being a minimal chunk useful enough for Rossbot to do whatever it is he does with Javascript that makes the ladies throw their panties on stage, and not a single byte more. We ended up with something that looks like this:

You got me.

Oh, you’re a clever one. That’s actually a picture of a fish. We really ended up with something like this:

NSIDaemail@address.comacharacter_nameareal nameaicon serveraicon farmapath aliasais_friendais_familyamagic_dust

Thus, we’re generating a bunch of contact data separated by designated control characters, and ultimately stored in a TEXT field in a database. The first time your cache is built, we actually walk your entire contact list and generate one of these chunks for each person you’re affiliated with. On subsequent updates, we use a bit of regular expression hoohah pixie dust to only change the necessary details from individuals, and write those changes back to the DB.

Big ups to Mylesbot for his help with making these tasks as efficient and as well-oiled as he is.

Speaking of updates, clearly we have to make sure we catch any changes you or your contacts make, so we have various spots around the site that fire off these offline tasks – when you update your various profile details, when you pick a named URL on Flickr for the first time, or when change your relationship with someone.

These updates have been carefully honed to work in the context of what’s changing – again, to squeeze out as much speed as we can. F’instance, there’s no need for us to tell all of your contacts that your relationship with SexyBabe43 has progressed to ‘Friend’. Unless that’s your sort of thing, but really, let’s leave that as an exercise for the reader.

All of this attention to detail has ultimately helped us eck out as much speed as possible. Seeing a theme here? So any time you’re sending a Flickrmail, searching for a contact or sharing a photo, think of the robots, and smile that secret little smile of yours, knowingly.

Tags in Space

A lot of you enjoyed our post (“Found in Space”) on the amazing astrometry.net project, and there have been some interesting followups.

A mysterious figure known only as “jim” paired up astronomy photos from Flickr with Google Sky. (You’re going to need the Google Earth plug-in for your browser — just follow the instructions on that page if you don’t have it.) In his technical writeup, “jim” explains how he used the Yahoo Query Language (YQL) to fetch the data. YQL is similar to the existing Flickr APIs, but it’s a query language like SQL rather than a set of REST-ish APIs. And both of those are really just ways to get data out of Flickr’s machine tag system, specifically the astro:* namespace. It’s turtles all the way down.

Who else is using astrotags? The British Royal Observatory in Greenwich is sponsoring a contest to determine the Astronomy Photographer of the Year and the whole thing is based on a Flickr group and extensive use of Flickr’s APIs. The integration is so seamless — galleries of photos and discussions are surfaced on their site as well as ours — you might as well consider Flickr to be their “backend” server. But they’ve also added much, such as great documentation about how to astrotag your photos as well as a concise explanation about how Astrometry.net identifies your photo, even among millions of known stars. (The sci-fi website io9 interviewed Fiona Romeo of the Royal Observatory about the contest; check it out.)

It’s dizzying how many services have been combined here — Astrometry.net grew out of research at the University of Toronto, web mashups use Google Sky for visualization in context, Yahoo infrastructure delivers and transforms data, the Royal Observatory at Greenwich provides leadership and expertise, and then little old Flickr acts as a data repository and social hub. And let’s not forget you, the Flickr community, and your inexhaustible creativity — which is the reason why all this can even come together.

All this was done with pretty light coordination and few people at Flickr were even aware what was going on until recently. I have no idea what the future is for APIs and a web of services loosely joined, but I hope we get to see more and more of this sort of thing.

Building Fast Client-side Searches

Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client.

Server Side: Cache Everything

To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc. Whenever a member updates their information, we update the cache for all of their contacts. Since a single member who changes their contact information can require updating the contacts cache for hundreds or even thousands of other members, we rely upon prioritized tasks in our offline queue system.

Testing the Performance of Different Data Formats

Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. This means that we need to defer loading until it’s needed, and that we have to be able to request, download, and parse the contact list in the amount of time it takes a member to go from hovering over a text field to typing a name.

With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. We started with Ajax and XML; this proved to be the slowest by far, so much so that the larger test cases wouldn’t even run to completion (the tags used to create the XML structure also added a lot of weight to the filesize). It appeared that using XML was out of the question.

BoSelectaJsonGoodFunTimes: eval() is Slow

DJ Bo Selecta on the decks

Next we tried using Ajax to fetch the list in the JSON format (and having eval() parse it). This was a major improvement, both in terms of filesize across the wire and parse time.

While all of our tests ran to completion (even the 10,000 contacts case), parse time per contact was not the same for each case; it geometrically increased as we increased the number of contacts, up to the point where the 10,000 contact case took over 80 seconds to parse — 400 times slower than our goal of 200ms. It seemed that JavaScript had a problem manipulating and eval()ing very large strings, so this approach wasn’t going to work either.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,617 1536 81312 0.14 7.66
4,878 681 18842 0.14 3.86
2,979 393 6987 0.13 2.35
1,914 263 3381 0.14 1.77
1,363 177 1837 0.13 1.35
798 109 852 0.14 1.07
644 86 611 0.13 0.95
325 44 252 0.14 0.78
260 36 205 0.14 0.79
165 24 111 0.15 0.67
JSON and Dynamic Script Tags: Fast but Insecure

Working with the theory that large string manipulation was the problem with the last approach, we switched from using Ajax to instead fetching the data using a dynamically generated script tag. This means that the contact data was never treated as string, and was instead executed as soon as it was downloaded, just like any other JavaScript file. The difference in performance was shocking: 89ms to parse 10,000 contacts (a reduction of 3 orders of magnitude), while the smallest case of 172 contacts only took 6ms. The parse time per contact actually decreased the larger the list became. This approach looked perfect, except for one thing: in order for this JSON to be executed, we had to wrap it in a callback method. Since it’s executable code, any website in the world could use the same approach to download a Flickr member’s contact list. This was a deal breaker.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,709 1105 89 0.10 0.01
4,877 508 41 0.10 0.01
2,979 308 26 0.10 0.01
1,915 197 19 0.10 0.01
1,363 140 15 0.10 0.01
800 83 11 0.10 0.01
644 67 9 0.10 0.01
325 35 8 0.11 0.02
260 27 7 0.10 0.03
172 18 6 0.10 0.03
Going Custom

Custom Ride

Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.

Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.

that.contacts = o.responseText.split("\c");

for (var n = 0, len = that.contacts.length, contactSplit; n < len; n++) {

	contactSplit = that.contacts[n].split("\a");

	that.contacts[n] = {};
	that.contacts[n].n = contactSplit[0];
	that.contacts[n].e = contactSplit[1];
	that.contacts[n].u = contactSplit[2];
	that.contacts[n].r = contactSplit[3];
	that.contacts[n].s = contactSplit[4];
	that.contacts[n].f = contactSplit[5];
	that.contacts[n].a = contactSplit[6];
	that.contacts[n].d = contactSplit[7];
	that.contacts[n].y = contactSplit[8];
}

Though this technique sounds like it would be slow, it actually performed on par with native JSON parsing (it was a little faster for cases containing less than 1000 contacts, and a little slower for those over 1000). It also had the smallest filesize: 80% the size of the JSON data for the same number of contacts. This is the format that we ended up using.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,741 818 173 0.08 0.02
4,877 375 50 0.08 0.01
2,979 208 34 0.07 0.01
1,916 144 21 0.08 0.01
1,363 93 16 0.07 0.01
800 58 10 0.07 0.01
644 46 8 0.07 0.01
325 24 4 0.07 0.01
260 14 3 0.05 0.01
160 13 3 0.08 0.02
Searching

Ben to the Rescue

Now that we have a giant array of contacts in JavaScript, we needed a way to search through them and select one. For this, we used YUI’s excellent AutoComplete widget. To get the data into the widget, we created a DataSource object that would execute a function to get results. This function simply looped through our contact array and matched the given query against four different properties of each contact, using a regular expression (RegExp objects turned out to be extremely well-suited for this, with the average search time for the 10,000 contacts case coming in under 38ms). After the results were collected, the AutoComplete widget took care of everything else, including caching the results.

There was one optimization we made to our AutoComplete configuration that was particularly effective. Regardless of how much we optimized our search method, we could never get results to return in less than 200ms (even for trivially small numbers of contacts). After a lot of profiling and hair pulling, we found the queryDelay setting. This is set to 200ms by default, and artificially delays every search in order to reduce UI flicker for quick typists. After setting that to 0, we found our search times improved dramatically.

The End Result

Head over to your Contact List page and give it a whirl. We are also using the Bo Selecta with FlickrMail and the Share This widget on each photo page.

[changelog] Revision of the Places page, also Neighborhoods

A slightly overdue (and longer than it was supposed to be) post, considering this happened a while ago, but I thought I’d mention a few subtle updates to the Places page.

Even before that though we’ve added neighborhood links to the Photo pages, before we just listed the neighborhood …

Neighborhood Link

… now its a link through to the Places page itself, which look rather like this …

Neighborhoods (South Bank)

Meaning that from a photo that’s been geotagged you’ll be able to get more of a feel for the local area. Obviously this work better in large Cities where the neighborhoods themselves can be as big as towns, while in the towns you’re more likely to find the one or two photographers who count each neighborhood as their stomping ground.

I guess that’s the big addition, but we also tweaked a few other things at the same time, here’s a before and after shot, you’ll probably need to click through to the larger size if you want to see the details.

Old and Updated Places Pages

On the left side of each Places page we’ve moved different elements around, pushing the search further up, the date/time down and scrapping the weather altogether now that we’ve established that it rains in London.

The functional changes over on the right involved moving the title, attribution, Next & Previous buttons off the photo. When we launched Places we didn’t have Videos, and now we do and the old position clashed with the video controls.

The other benefit of the Next/Prev moving to above the photo is that they dont jump around as the photos resized. We also added key controls, now you can just press the forward and back cursor buttons on the keyboard to keep going through photos, power user tip!

The “paging” buttons no longer hover over the top of the thumbnails, as they were …

  1. Annoying.
  2. Not always obvious.

 

On a more technical level, now only geotagged photos appear on the Places page, and where possible the location shown on the map (sometimes with Neighborhoods, due to the nature of the beast, they can be just off the edge of the map).

Big obvious arrow demonstrating the feature :) …

Geotagged

When we first launched the Places page we wanted to make sure that each location had plenty of photos, so we used a combination of geotagging and tags/description to automate the selection of them. Which lead to interesting results such as the city of Reading in England featuring a lot of photos about books (tagged reading, natch). Now that we have over 100 Million geotagged photos we’ve switched to “just” them.

We also factor in the Season a photo was taken in to select the interesting ones, to give us a bit more change in the first photos you see and too keep them relatively, well, seasonal. We’ll probably tweak this again soon to get them to rotate even more often, but still working through that one.

City Colours and Endless Photos

As mentioned above we moved the time down and, partly for whimsy, partly because they’re really useful, used the Dopplr colour to display it and link their pages. Here’s our Los Angeles page and Dopplr’s Los Angeles page, Dopplr decided to use Flickr to select photos for each City they know about, so we thought we borrow their colour in return :)

You can read more about how Dopplr (and therefor us) calculate the colour for a place over on their blog: In rainbows and Darker city colours.

Dopplr And Single Row

Finally, because I’ve gone on enough already, the old design used to have two rows of thumbnails under the main photo and a total of 72 photos, meaning there were 6 “pages” of thumbnails. When you got to the last page, that was kinda it, you couldn’t go any further.

Instead there’s now just one row, but I bolted on the API, so it keeps trying to load more and more photos as you get close to the end of the current lot. Instead of just 72 photos for Los Angeles there’s now the full (currently) nearly half million 444,594 photos.

Which reminds me, we should probably add a Slideshow that that page :)

And that’s the revised Places page.

[Edit: Oh and I know we’ve just launched Stats (again), but it’s nice to give the dev a couple of days to recover before forcing them to write a changelog about it ;) ]