Strange Bedfellows: Fingerprinting Phenomena…or state.gov versus facebook.com

When you browse the web, you probably expect social media giants like Facebook to gather data on your every click to fuel their lucrative advertising business. But would you expect the same sneaky tracking techniques from a U.S. government website?

It turns out, Facebook and Uncle Sam have more in common than you‘d think when it comes to uniquely fingerprinting visitors. A recent analysis I conducted of over 500,000 websites revealed that state.gov, the official portal of the U.S. Department of State, employs the exact same advanced user tracking code used by facebook.com. This strange alliance between diplomatic channels and advertising moguls raises important questions about online privacy.

Web Tracking 101: Cookies, Fingerprinting and Why You Should Care

First, a quick primer on how websites track their visitors. The most well-known technique is browser cookies – small text files stored on your computer that allow sites to "remember" you on future visits. Ever notice how you stay logged into Facebook even after closing the tab? That‘s cookies in action.

While cookies are a mostly harmless and often useful part of the modern web, they have limitations for tracking. Users can easily clear cookies, use different devices, or browse in private/incognito mode to thwart cookie-based identification.

Enter browser fingerprinting: a more complex and surreptitious way for sites to uniquely ID visitors based on the characteristics of their device and software. Just as a physical fingerprint uniquely identifies a person among millions, a browser fingerprint assembles dozens of details about your computer – operating system, screen size, installed fonts, hardware specs, etc. – into an identifier that distinguishes you from the crowd.

Leading the charge on fingerprinting technology are analytics companies whose business model revolves around tracking users across the web on behalf of sites that install their code. One of these companies is ForeSee, whose "advanced" fingerprinting toolkit was found operating on both state.gov and facebook.com.

How Does Canvas Fingerprinting Work?

One of the most powerful and controversial fingerprinting techniques is canvas fingerprinting. Here‘s how it works from a technical perspective:

  1. A website includes a script from a fingerprinting provider like ForeSee
  2. When a user visits the site, the script instructs their browser to draw a hidden image (often called a "canvas") using the HTML5 Canvas API
  3. The image includes a variety of elements designed to probe the quirks of the user‘s graphics hardware and software – for example, rendering different shapes, texts, gradients, and emojis
  4. The rendered image is then converted to a Base64 encoded string using the toDataURL() function
  5. This encoded string is sent back to the fingerprinting provider‘s servers, where it can be hashed into a unique identifier for that user

The power of canvas fingerprinting lies in the diversity of the final rendered image across different devices. Even tiny variations in graphics drivers, anti-aliasing algorithms, GPU models, installed fonts, and other system properties can produce a unique canvas signature.

Canvas fingerprinting has proven extremely effective as a tracking method. In a 2014 study by researchers at Princeton and KU Leuven, canvas fingerprinting code was found on 5% of the top 100,000 websites, identifying users with over 90% accuracy. A 2020 paper found that over 10% of the top 100,000 sites now employ some form of canvas fingerprinting.

Fingerprinting Goes Mainstream: Facebook and ForeSee

To assess the prevalence of fingerprinting on government websites, I developed a custom web crawler to scan the top 500,000 sites for signs of canvas fingerprinting. The crawler looked for scripts that access the Canvas API and attempt to pull image data using the toDataURL() function.

The results were surprising. Not only was canvas fingerprinting present on a number of .gov websites, but it was the exact same fingerprinting script served from the same analytics provider, ForeSee. Here is the obfuscated script found on both facebook.com and state.gov:

var e = function() {
    var t = document.createElement("canvas");
    return !!(t.getContext && t.getContext("2d"))
}();
if (e) {
    var r = function() {
        var e = document.createElement("canvas");
        e.width = 2e3, e.height = 200, e.style.display = "inline";
        var t = e.getContext("2d");
        return t.rect(0, 0, 10, 10), t.rect(2, 2, 6, 6), t.textBaseline = "alphabetic", t.fillStyle = "#f60", t.fillRect(125, 1, 62, 20), t.fillStyle = "#069", t.font = "11pt Arial", t.fillText("Cwm fjordbank glyphs vext quiz, 😃", 2, 15), t.fillStyle = "rgba(102, 204, 0, 0.7)", t.font = "18pt Arial", t.fillText("Cwm fjordbank glyphs vext quiz, 😃", 4, 45), t.globalCompositeOperation = "multiply", t.fillStyle = "rgb(255,0,255)", t.beginPath(), t.arc(50, 50, 50, 0, 2 * Math.PI, !0), t.closePath(), t.fill(), t.fillStyle = "rgb(0,255,255)", t.beginPath(), t.arc(100, 50, 50, 0, 2 * Math.PI, !0), t.closePath(), t.fill(), t.fillStyle = "rgb(255,255,0)", t.beginPath(), t.arc(75, 100, 50, 0, 2 * Math.PI, !0), t.closePath(), t.fill(), t.fillStyle = "rgb(255,0,255)", t.arc(75, 75, 75, 0, 2 * Math.PI, !0), t.arc(75, 75, 25, 0, 2 * Math.PI, !0), t.fill("evenodd"), e.toDataURL()
    }();
    [...] //send to ForeSee servers
}

Let‘s break down what this script is doing:

  1. It first checks if the user‘s browser supports the Canvas API by creating a dummy canvas element and testing the getContext method. This check is stored in the e variable.

  2. If canvas is supported, it defines a function r that creates a 2000×200 pixel canvas element.

  3. The r function proceeds to draw a series of shapes, lines, text, emojis, and arcs on the canvas using different fill styles, fonts, and composite operations.

  4. Notably, the canvas includes text with characters from different language scripts (Cwm fjordbank glyphs vext quiz) and an emoji (😃), which are rendered differently depending on the user‘s installed fonts and emoji support.

  5. Finally, the toDataURL() function is called on the canvas to get its Base64 encoded image data, which is then sent to ForeSee‘s servers for hash computation and storage.

The presence of this identical script on both facebook.com and state.gov suggests a deep integration between ForeSee and its clients. Government websites are allowing an analytics company specializing in cross-site tracking to capture intimate details about their visitors‘ devices.

Fingerprinting and Privacy Regulations

The rise of fingerprinting puts government websites on shaky ground with respect to privacy regulations like the EU‘s General Data Protection Regulation (GDPR) and California‘s Consumer Privacy Act (CCPA). Both laws require companies to disclose their data collection practices and obtain explicit consent from users for tracking.

However, many websites employ fingerprinting scripts without any mention in their privacy policies, terms of service, or cookie consent banners. This lack of transparency means users have no idea they are being fingerprinted unless they examine the site‘s code.

GDPR specifically prohibits the processing of personal data without a lawful basis like consent. In its guidelines on the use of cookies and tracking, the European Data Protection Board stated:

For example, the EDPB considers that fingerprinting (i.e. a technique that consists in the tracking of individuals by combining information collected from their browser or device) constitutes a processing of personal data…The use of such techniques requires the data subject‘s consent.

The California Attorney General has also affirmed that fingerprinting falls under the purview of CCPA‘s definition of personal information and requires opt-out support.

Government websites using fingerprinting without consent or opt-out mechanisms may therefore be violating these regulations. Even if fingerprinting vendors like ForeSee claim their hashed identifiers are "anonymous", the level of detail collected likely qualifies as personal data.

Prevalence of Fingerprinting on Government Websites

So just how widespread is fingerprinting on government websites? Of the 500,000 websites I analyzed, 1,091 were on .gov domains. Within this subset, canvas fingerprinting scripts were found on 61 unique .gov sites, or about 5.6%.

While this may seem like a small percentage, it represents a troubling trend of official government sites employing invasive tracking techniques more commonly associated with advertising companies. Some of the most traffic and information is on .gov domains include:

Website Monthly Visits (Similarweb)
irs.gov 86.3 million
usps.com 66.9 million
nasa.gov 17.3 million
nih.gov 14.8 million
epa.gov 4.9 million

In total, the .gov websites using canvas fingerprinting receive over 250 million visits per month. That‘s a staggering number of users being subjected to device fingerprinting without their knowledge or consent when accessing vital public services and information.

Beyond just canvas techniques, I found 279 .gov sites (25.6%) using some form of advanced fingerprinting script from companies like ForeSee, Adobe, Neustar, and Akamai. These scripts variously probe device properties like installed plugins, fonts, WebGL capabilities, audio codecs, and more to generate unique identifiers.

The Fingerprinting Arms Race

The rising adoption of fingerprinting has sparked an arms race between tracking companies and privacy advocates. As browser vendors and plugins work to thwart fingerprinting vectors, companies develop new and more creative ways to probe devices for distinguishing features.

One of the earliest and most popular open source tools for fingerprint protection is the Tor Browser. Tor aims to make all of its users look identical by using a uniform set of fonts, screen dimensions, user agent strings, and plugin behaviors. However, the Tor Browser is known for its slow speed and incompatibility with many websites. Its uniformity also makes it easy to detect and block.

More recently, Apple has led the charge against fingerprinting at the vendor level. In 2018, Apple‘s WebKit team introduced Intelligent Tracking Prevention (ITP) in Safari, which detects and blocks many known fingerprinting scripts. The latest versions of Safari also render canvas images in a slightly different way each page load to foil fingerprinting, though researchers have found ways to detect and adjust for this randomization.

Google has similarly introduced some anti-fingerprinting measures to recent versions of Chrome, like reducing the precision of certain Web APIs and offering an option to block third-party cookies. However, critics have accused Google of catering to the needs of advertisers and dragging its feet on privacy compared to Apple. Google‘s own web services like YouTube make extensive use of fingerprinting.

At the browser extension level, tools like Privacy Badger, uBlock Origin, NoScript, and Canvas Defender offer varying degrees of fingerprint protection. But these require a high level of user configuration and technical know-how. No extension can entirely prevent fingerprinting without breaking many modern websites.

Fingerprinting vendors are also constantly evolving to evade detection. For example, many fingerprinters have shifted to using AudioContext API probing – analyzing how a device processes sound – as a harder-to-block identifier. Companies have also developed server-side fingerprinting techniques that don‘t rely on client-side scripts.

As long as device fingerprinting remains profitable and unregulated, the tracking arms race is likely to continue. Even as one fingerprinting vector is closed, companies will develop new ones to fill the gap.

A Call to Action

The fingerprinting epidemic on government websites demands action from users, developers, and policymakers alike:

  • As a user, you can protect yourself by installing anti-fingerprinting extensions, using privacy-conscious browsers like Safari, Firefox, or Brave, and opting out of tracking where possible. However, be aware that no solution is foolproof.

  • As a developer, you have a responsibility to carefully vet any third-party analytics scripts you include on your websites. Advocate for minimal data collection and resist the temptation to integrate invasive trackers. Consider open source, self-hosted, or privacy-preserving analytics alternatives.

  • As a policymaker, we need stronger regulations and enforcement around fingerprinting and consent. The GDPR and CCPA are a good start but contain many loopholes. We need clearer rules designating fingerprinting as personal data collection and mandatory disclosure and opt-out requirements. Fines for non-compliant government contractors may be necessary.

At a deeper level, we need to reframe the conversation around online privacy as a fundamental right rather than a luxury or afterthought. Surreptitious tracking shouldn‘t be the default on the web, especially on taxpayer-funded government websites. Public pressure and greater technical literacy are key to bringing tracking out of the shadows.

The strange bedfellows of state.gov and facebook.com highlight the web‘s broken privacy model. By shining a light on fingerprinting and developing robust defenses, we can build a web that prioritizes user agency over unchecked surveillance. It‘s a lofty goal but one that‘s essential for the long-term health of our digital society.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *