New browser-fingerprinting techniques
Web tracking, which is generally used by advertisers to target their ads, is not popular in some circles—particularly with privacy advocates and privacy-conscious users. But it is also fairly pervasive. Originally, tracking was done using browser cookies, but tracking techniques have expanded over the years. A recent web survey has found several new ways that advertisers and tracking companies are fingerprinting browsers so that the users sitting in front of them can be tracked across the web.
The Princeton
Web Census is a study done by Steven Englehardt and Arvind Narayanan
to look at both cookie-based (stateful) and browser-fingerprint-based
(stateless) tracking on the top 1 million web sites. The survey was run in
January
by making some 90 million requests to those sites. The survey was run using OpenWPM, which is an
open-source project to make "it easy to collect data for privacy
studies on a scale of thousands to millions of site[s]
". In
addition, the data
gathered by the study is available for others to use.
The output of the study was a 24-page paper [PDF] that covers quite a bit of ground. The study looked at cookie-based tracking, as well as cookie syncing, where advertising/tracking companies share cookie IDs either in headers (e.g. the referrer header) or behind the scenes. There are some rather interesting findings, many of which are summarized on the pages linked above, but perhaps the most interesting findings are the new ways tracking companies are trying to fingerprint browsers.
The idea behind fingerprinting is straightforward; gather enough information about the user's browser and its environment (plugins, fonts, User-Agent header, localization settings, etc.) to uniquely (or nearly uniquely) identify the user. The Panopticlick tool from the Electronic Frontier Foundation (EFF) demonstrates the uniqueness of a user's browser. The current version of the tool uses some additional techniques, including canvas fingerprinting—drawing images into a hidden <canvas> element to measure the rendering differences between different browsers.
In the survey, canvas fingerprinting was found on more than 14,000 sites, where the actual tracking scripts came from roughly 400 different domains. The sites that use canvas fingerprinting (and the domains where the scripts originate) are listed on the web census page, as are those that use the newer fingerprinting methods described below.
Browser developers have taken some steps to avoid revealing high-value information like font lists, so the fingerprinters have made efforts to find workarounds. One that the study found uses the JavaScript measureText() method to provide font information. By attempting to draw a specific text string in a large number of fonts and comparing the width of the result to the width obtained using the default font, the tracking script can figure out which fonts are not present (since those will be drawn in the default font, thus have the same width). The study calls this "canvas-font fingerprinting" and found it on more than 3,200 sites. One third party (MediaMath) was responsible for most of the scripts found, but there are five other third parties found that are using canvas-font fingerprinting.
The WebRTC realtime communication feature is another vector for leaking private information that can be used in fingerprinting. In order to facilitate finding the best route between two peers, WebRTC nodes collect information on IP addresses of interest, including those used by local network interfaces (which may well be unroutable NAT addresses from behind a firewall). These addresses are made available to WebRTC, which leads to privacy concerns in its own right, but may also be used for fingerprinting purposes.
The researchers instrumented the WebRTC createDataChannel() and createOffer() API calls, then tried to determine if those calls were made for tracking purposes. In the top 1 million sites, 700 or so delivered scripts that accessed WebRTC, with more than 600 being used for tracking purposes. Furthermore:
Another clever "attack" (at least on privacy) uses the Web Audio API to detect differences in the hardware and browser implementation that provide some amount of information about the browser. It is unclear at this point whether there is enough information gleaned from that to provide a fingerprint, but it certainly can be used in conjunction with other techniques.
One of the tracking scripts using the Audio API is simply looking for the presence of certain elements of the API (AudioContext and OscillatorNode) to provide a single bit of information to a more widespread fingerprint. The other two take the output from the oscillator, do some calculations on it, and produce a hash. The researchers only found roughly 500 occurrences of the simplest technique, the other two total to less than 60. This new fingerprint method was found by analyzing known tracking scripts for the use of new APIs.
OpenWPM is Firefox-based, which allowed the researchers to test with certain add-ons that are meant to block tracking scripts, such as Ghostery and EasyList + EasyPrivacy. For the most part, these tools blocked the majority of the more widespread, canvas-based techniques (i.e. canvas and canvas-font) and had less success with the newer fingerprinting methods (i.e. WebRTC and Audio) on sites that use them. For both of these blocking mechanisms, which are blacklist-based, the more prevalent third-party scripts were blocked. That resulted in covering the majority of the sites, but not generally a majority of the scripts, as less-popular scripts that are infrequently used do not get onto the blacklist.
Overall, the paper makes for an engaging look at the user-tracking landscape of the web. It is a reminder that web browsers today have an enormous reach that can be exploited to identify their users. It will be yet another arms race in the digital world, where browser makers and standards groups seek to close or narrow the information leaks (to the extent they can), while advertisers and tracking companies try to find more ways to gather their precious data. But closing the holes is a balancing act and—since vast sums of money are at stake—one suspects that these companies will always find a way to track.
| Index entries for this article | |
|---|---|
| Security | Anonymity |
| Security | Web browsers |