[go: up one dir, main page]

Today's The Fast and the Curious post covers the launch of Skia's new rasterization backend, Graphite, in Chrome on Apple Silicon Macs. Graphite is instrumental in helping Chrome achieve exceptional scores on Motionmark 1.3 and is key to unlocking a ton of future improvements in Chrome Graphics.

A brief history of Skia in Chrome

In Chrome, Skia is used to render paint commands from Blink and the browser UI into pixels on your screen, a process called rasterization. Skia has powered Chrome Graphics since the very beginning. Skia eventually ran into performance issues as the web evolved and became more complex, which led Chrome and Skia to invest in a GPU accelerated rasterization backend called Ganesh.

Over the years, Ganesh matured into a solid highly performant rasterization backend and GPU rasterization launched on all platforms in Chrome on top of GL (via ANGLE on Windows D3D9/11). However, Ganesh always had a GL-centric design with too many specialized code paths and the team was hitting a wall when trying to implement optimizations that took advantage of modern graphics APIs in a principled manner.

This set the stage for the team to rethink GPU rasterization from the ground up in the form of a new rasterization backend, Graphite. Graphite was developed from the start to be principled by having fewer and more comprehensible code paths. This forward looking design helps take advantage of modern graphics APIs like Metal, Vulkan and D3D12 and paradigms like compute based path rasterization, and is multithreaded by default.

Results

With Graphite in Chrome, we increased our Motionmark 1.3 scores by almost 15% on a Macbook Pro M3. At the same time, we improved real world metrics like INP (interaction to next paint time), LCP (time to largest contentful paint), graphics smoothness (percent dropped frames), GPU process malloc memory usage, and others. This all means substantially smoother interactions, less stutter when scrolling, and less time waiting for sites to show.

Differences between Graphite and Ganesh

Modern graphics APIs

Ganesh was originally implemented on OpenGL ES, which had minimal support for multi-threading or GPU capabilities like compute shaders. Since then, modern graphics APIs like Vulkan, Metal and D3D12 have evolved to take advantage of multithreading and expose new GPU capabilities. They allow applications to have much more control over when and how expensive work such as allocating GPU resources is performed and scheduled, while utilizing both the CPU and the GPU effectively.

While we were able to adapt Ganesh to support modern graphics APIs, it had accumulated enough technical debt that it became hard to fully take advantage of the multi-threading and GPU compute capabilities of modern graphics APIs.

For Graphite in Chrome, we chose to use Chrome's WebGPU implementation, Dawn, as the abstraction layer for platform native graphics APIs like Metal, Vulkan and D3D. Dawn provides a baseline for capabilities common in modern graphics APIs and helps us reduce the long term maintenance burden by leveraging Dawn's mature well tested native backends instead of implementing them from scratch for Graphite.

2D depth(?!) testing

A core part of the GPU rendering pipeline is depth testing, which can reduce or eliminate overdraw by drawing opaque objects in front to back order, followed by translucent objects back to front. In graphics, "overdraw" refers to the unnecessary rendering of the same pixels multiple times, which can negatively impact performance and battery life, especially on mobile devices.

Ganesh never utilized the depth testing capabilities of graphics cards, which was admittedly intended for rendering 3D content and not accelerating 2D graphics. Ganesh suffers from overdraw due to its reliance on adhering to strict painters order when drawing both opaque and translucent objects.

Graphite extends Skia’s GPU rendering to take advantage of the depth test by assigning each “draw” a z value defining its painter’s ordering index. While transparent effects and images must still be drawn from back to front, opaque objects in the foreground can now automatically eliminate overdraw. This means opaque draws can be re-ordered to minimize expensive GPU state changes while relying on the depth buffer to produce correct output.

Depth testing is also used to implement clipping in Graphite by treating clip shapes as depth only draws as opposed to maintaining a clip stack like in Ganesh. Besides reducing algorithmic complexity, a significant benefit to this approach is that the shader program required to render a “draw” does not also depend on the state of the clip stack.

Left: Frame from Motionmark Suits Right: Depth buffer for the same frame.

Multithreading

Chromium is a complex multi-process application, with render processes issuing commands to a shared GPU process that is responsible for actually displaying everything in a webpage, tab, and even the browser UI. The GPU process main thread is the primary driver of all rendering work and is where all GPU commands are issued.

Due to the single threaded nature of Ganesh and OpenGL, only a limited set of work could be moved to other threads, making it easy to overload the main thread causing increased jank and latency ultimately hurting user experience.

In contrast, Graphite's API is designed to take advantage of multithreading capabilities of modern graphics APIs. Graphite’s new core API is centered around independent Recorders that can produce Recordings on multiple threads, with minimal need to synchronize between them. Even though the Recordings are submitted to the GPU on the main thread, more expensive work is moved to other threads when producing Recordings, keeping the GPU main thread free.

Performance cliffs and pipeline compilation

When Ganesh was initially implemented, the programmable capabilities of graphics cards were quite limited, and branching in particular was expensive. To work around this, Ganesh had many specialized shader pipelines to handle common cases. These specializations are hard to predict and depend on a large number of factors related to each individual draw, leading to an explosion of different pipelines for essentially the same page content. Since these pipelines must each be compiled, it doesn't work well for modern web content which might have effects and animations trigger new pipelines at any moment, causing noticeable jank.

Graphite’s design philosophy is instead to consolidate the number of rendering pipelines as much as possible while still preserving performance. This reduces the number of pipelines that have to be compiled, and makes it possible for Chrome to ensure they are compiled at startup so they do not interrupt active browsing. Ganesh’s specialization approach also led to surprising performance cliffs. For example, while it could handle simple cases, real page content was often a complex mix. By consolidating pipelines, complex content can be rendered as effectively as simple content.

Future Plans

Multithreaded Rasterization

Currently, Graphite is integrated into Chromium using two Recorders: one handles web content tiles and Canvas2D on the main thread, while the other is for compositing. In the future, this model will open up a number of exciting possibilities to further improve Chrome’s performance. Instead of saturating the main GPU thread with the tasks from each renderer process, rasterization can be forked across multiple threads.

Current:

Future:

Reducing GPU memory for simple content

Graphite recordings can also be re-issued to the GPU with certain dynamic changes such as translation. This can be used to accelerate scrolling while eliminating the unnecessary work to re-issue rendering commands. This lets us automatically reduce the amount of GPU memory required to cache web content as tiles. If the content is simple enough, the performance difference between drawing a cached image and drawing its content can be worth skipping allocating a tile for it and just re-rendering it each frame.

GPU Compute Path Rasterization

In the landscape of 2D graphics rendering, GPU compute-based path rasterization is very much en vogue with recent implementations like Pathfinder and vello. We would like to implement these ideas in Skia, possibly using a hybrid approach. Currently, Graphite relies on MSAA where it can, but in many cases we can't due to poor performance on older integrated GPUs or high memory overhead on non-tiling GPUs, and we have to fallback to CPU path rasterization using an atlas for caching. GPU compute based path rasterization would allow us to improve over both the visual quality of MSAA which is often limited to 4 samples per pixel and over the performance of CPU rasterization.

These are future directions the Chrome Graphics team plans to pursue, and we are excited to see how far we can push the needle.

Update (6/10/2025): This blog was updated to reflect that testing was done using the Speedometer 3.1 benchmark, and resulted in a 22% performance improvement. The previous version incorrectly noted that the performance improvement was 10% and that the benchmark was Speedometer 3.

Performance has always been one of the core pillars of Chrome and it’s something we’ve never stopped investing in. Publicly available and open benchmarks, which we create in open collaboration with other browsers, are useful tools for tracking our overall progress, understanding new areas of improvement, and validating potential optimizations. In today’s The Fast and the Curious post, we’d like to go through Chrome’s recent work that enabled it to achieve the highest score ever on the Speedometer benchmark.

For Speedometer, these optimizations have resulted in a 22% improvement since August 2024. That 22% improvement leads to better browser experiences, higher conversions for businesses, and deeper enjoyment of what the web has to offer. If each Chrome user used Chrome for just 10 minutes a day, these improvements collectively save 116 million hours or roughly 166 lifetimes worth of waiting around for websites to load and do things.

Speedometer 3.1 score measured on Apple Macbook Pro M4 with MacOS 15

Speedometer is a benchmark created in open collaboration with other browsers and measures web application responsiveness through workloads that cover a large variety of different areas of the Blink rendering engine used in Chrome:

  • HTML parsing
  • JavaScript and JSON processing
  • JavaScript and Document Object Model (DOM) interaction
  • DOM manipulations (element insertion and removal)
  • Text size computation (font shaping)
  • Cascading Style Sheet (CSS) application and layout calculation
  • Pixel rendering

In essence, Speedometer tests critical components of the entire rendering pipeline. For a deeper dive into these individual parts, we recommend the presentation Life of a Script at Chrome University.

Achieving exceptional web performance requires a multifaceted approach, and optimizing for Speedometer is a testament to overall product excellence. Over the past year, our team has focused on refining fundamental rendering paths across the entire stack. Here are some notable optimization examples.

The team heavily optimized memory layouts of many internal data structures across DOM, CSS, layout, and painting components. Blink now avoids a lot of useless churn on system memory by keeping state where it belongs with respect to access patterns, maximizing utilization of CPU caches. Where internal memory was already relying on garbage collection in Oilpan, e.g. DOM, the usage was expanded by converting types from using malloc to Oilpan. This generally speeds up the affected areas as it packs memory nicely in Oilpan’s backend.

Strings in the renderer improved quite a bit over the last year by avoiding costly representations where possible and switching hashing to rapidhash. More generally, lots of data structures were equipped with better hashes, filters, and probing algorithms.

Where rendering becomes inherently expensive, e.g., for computing CSS styles across various elements, caches are now used much more effectively with better hit rates. At the same time we cache fewer things that are not relevant. Another area where rendering becomes expensive is font shaping; the team significantly improved Apple Advanced Typography font shaping performance which is relevant everywhere text is rendered.

Today’s The Fast and the Curious post covers how Chrome achieved best-in-class Speedometer scores on mobile devices, resulting in faster and smoother web experiences for Android users.

Chrome has always been about speed. Whether it's loading pages quickly, running complex web apps smoothly, or delivering a seamless browsing experience, performance is at the heart of our browser. And we're always looking for ways to make Chrome even faster.

Over the last two years, we have been hard at work on a number of performance improvements for Android devices. We're excited to share some of the progress we've made.

Speedometer on Android

One of the key metrics we use to track Chrome's performance is the Speedometer benchmark. This benchmark is developed in collaboration with other major web browser engines and measures how quickly Chrome can complete interactions with web pages, including parsing/rendering HTML or CSS and running JavaScript.

Since the release of Chrome M112, we've seen a significant increase in Speedometer 2.1 scores on Android devices [1]. In fact, on many devices, scores more than doubled, with the newest Snapdragon® 8 Elite Mobile Platform setting new records for Speedometer performance on mobile devices! These huge accomplishments are a testament to the work not only of the Chrome and Android teams, but also our silicon and SoC partners.

Since Chrome M112, Speedometer 2.1 scores have more than doubled on many Android devices. [1]

How Did We Do It?

The improvements resulted from several changes, including:

  • Build optimizations: We've made a number of changes to the way Chrome is built, which has resulted in faster code execution tuned to modern premium Android devices and SoCs.
  • V8 and Blink improvements: Many improvements to the JavaScript engine (V8) and the rendering engine (Blink) have further boosted performance.
  • Scheduling, OS and SoCs: We worked closely with Android partners to optimize the way Chrome interacts with the operating system and its thread scheduling to make the best use of the silicon on the devices.

Let's take a closer look at each of these areas.

Build optimizations

The Android device ecosystem is very diverse. From entry-level phones to the newest premium ones, Chrome needs to run well on all devices. Up until last year, we shipped the same Chrome build to all these different Android devices. The memory and disk size constraints on entry-level devices resulted in Chrome having to prioritize a small binary size. Consequently, many modern build optimizations were out of reach, as they resulted in much larger binaries.

With M113, Chrome was finally able to ship a separate higher-performance build targeting premium Android devices via the Google Play Store. While we still ship a more binary-size-constrained build to other devices, this approach allowed us to land some of those modern optimizations into the new premium build:

  • By targeting 64-bit Arm instead of 32-bit Arm, we can make use of more efficient Arm instruction set features and larger 64-bit operations.
  • Since binary size is less relevant on premium devices with large disks and sufficient memory, we can now compile C++ code optimized for speed (-O2 / -O3) rather than size (-Oz).
  • Furthermore, we tweaked the inlining thresholds used by the compiler to enable more inlining in hot code (within and across modules), while updating the model and policy used by another compiler pass (MLGO) to reduce inlining in cold code.
  • We now also apply profile-guided optimization (PGO) techniques to the build to further improve the code layout and optimization level for hot code.
  • Finally, we improved cross-function code ordering by aligning Chrome's orderfile generation with the new 64-bit build. We also now include Speedometer 3, the latest version of the industry-standard browser speed benchmark, in the workloads used to generate the orderfile.

Together, these build optimizations account for more than half of the overall Speedometer score improvements. This progress was facilitated by our collaboration with Arm, who contributed valuable insights and improvements, including to identify and address inefficiencies in Chrome's PGO setup and inlining.

V8 and Blink improvements

Chrome continuously improves the performance of its JavaScript and web rendering engines, V8 and Blink. Most optimizations are small in individual impact, but stacked together, these improvements add up and contributed most of the remaining Speedometer impact! Notable ones include:

  • We now utilize an optimized fast-path HTML parser to parse innerHTML attributes.
  • V8 launched its Sparkplug compiler tier, a super fast baseline compiler that sits right above its Ignition interpreter and generates non-optimized code very quickly. Later, V8 also launched Maglev, a new mid-tier compiler that generates semi-optimized code. It takes longer to do so than Sparkplug, but much less time than Turbofan, V8's ultra-optimizing compiler tier. All together, this new tiering hierarchy allows V8 to tier up more gradually, improving both performance and power consumption.
  • We tuned our heuristics that decide when garbage collection occurs, targeting times when the rendering engine is idle or when users navigate away from pages.
  • We landed many other incremental optimizations, e.g. to V8 and our parsing, style, layout, and text rendering engines.

Scheduling and OS

To achieve the best possible performance, Android partners invest heavily in tuning the operating system's thread scheduling and frequency scaling policies, as well as improving the performance of the Silicon itself.

We worked closely with our partners to improve their tuning for Chrome and Speedometer. In particular, our collaboration with Qualcomm was very fruitful: By combining optimized scheduling policies with improved hardware performance, their newest Snapdragon 8 Elite mobile platform realized a 60-80% improvement in Speedometer 3.0 compared to its predecessor, resulting in class-leading web performance. This collaboration also highlighted important bottlenecks in Chrome's code, such as the need for improved PGO and opportunities in V8.

Speedometer 3.0 on Snapdragon 8 Gen 3 (left) compared to Snapdragon 8 Elite (right), Chrome M131

Why do these improvements matter?

Faster Speedometer scores translate to improvements in real user interactions with web content, such as faster page loads and interactions. Back at M112, loading a Google Docs document on Pixel Tablet took more than 50% longer than it does today -- that's the effect of a doubled Speedometer score!

Chrome M112 vs. M129 on Pixel Tablet, loading a Google Doc (frame count)

[1] Speedometer 3 was released during M122, so results from Speedometer 2.1 are provided for a full picture. Measurements shown in graphs were taken on Pixel Tablet.




Today’s The Fast and the Curious post explores how Chrome achieved the highest score on the new Speedometer 3.0, an upgraded browser benchmarking tool to optimize the performance of Web applications. Try out Chrome today! 

Speedometer 3.0 is a recently published benchmark for measuring browser performance that was created as an industry collaboration between companies like Google, Apple, Mozilla, Intel, and Microsoft. This benchmark helped us identify areas in which we could optimize Chrome to deliver a faster browser experience to all our users.

Here’s a closer look at how we further optimized Chrome to achieve the highest score ever Speedometer 3, by carefully tracking its recent performance over time as the updated benchmark was being developed. Since the inception of Speedometer 3 in May 2022, we've driven a 72% increase in Chrome’s Speedometer score - translating into performance gains for our users:



Optimizing workloads

By looking at the workloads in Speedometer and in which functions Chrome was spending the most time, we were able to make targeted optimizations to those functions that each drove an increase in Chrome’s score. For example, the SpaceSplitString function is used heavily to turn space-separated strings such as those in “class=’foo bar’ ” into a list representation. In this function we removed some unnecessary bound checks. When we detect that there are duplicated stylesheets, we dedupe them and reference a single stylesheet instance. We made an optimization to reduce the cost of drawing paths and arcs by tuning memory allocations. When creating form editors we detected some unnecessary processing that occurs when form elements are created. Within querySelector, we were able to detect what selector was commonly used and create a hot-path for that.

We previously shared how we optimized innerHTML using specialized fast paths for parsing, an implementation that also made its way into WebKit. Some workloads in Speedometer 3 use DOMParser so we extended the same optimization for another 1% gain.

We worked with the Harfbuzz maintainer to also optimize how Chrome renders AAT fonts such as those used by Apple Mac OS system fonts. Text starts as a processed stream of unicode characters that is then transformed into a glyph stream that is then run through a state machine defined in the AAT font. The optimization allows us to determine more quickly whether glyphs actually participate in the rules for the state machine, leading to speed-ups when processing text using AAT.

Picking the right code to focus on

An important strategy for achieving high performance is tiering up code, which is picking the right code to further optimize within the engine. Intel contributed profile guided tiering to V8 that remembers tiering decisions from the past such that if a function was stably tiered up in the past, we eagerly tier it up on future runs.

Improving garbage collection

Another area of changes that drove around 3% progression on Speedometer 3 was improvements around garbage collection. V8’s garbage collector has a long history of making use of renderer idle time to avoid interfering with actual application code. The recent changes follow this spirit by extending existing mechanisms to prefer garbage collection in idle time on otherwise very active renderers where possible. Specifically, DOM finalization code that is run on reclaiming objects is now also run in idle time. Previously, such operations would compete with regular application code over CPU resources. In addition, V8 now supports a much more compact layout for objects that wrap DOM elements, i.e., all objects that are exposed to JavaScript frameworks. The compact layout reduces memory pressure and results in less time spent on garbage collection.

Posted by Thomas Nattestad, Chrome Product Manager



On the Chrome team, we believe it’s not sufficient to be fast most of the time, we have to be fast all of the time. Today’s The Fast and the Curious post explores how we contributed to Core Web Vitals by surveying the field data of Chrome responding to user interactions across all websites, ultimately improving performance of the web.

As billions of people turn to the web to get things done every day, the browser becomes more responsible for hosting a multitude of apps at once, resource contention becomes a challenge. The multi-process Chrome browser contends for multiple resources: CPU and memory of course, but also its own queues of work between its internal services (in this article, the network service).

This is why we’ve been focused on identifying and fixing slow interactions from Chrome users’ field data, which is the authoritative source when it comes to real user experiences. We gather this field data by recording anonymized Perfetto traces on Chrome Canary, and report them using a privacy-preserving filter.

When looking at field data of slow interactions, one particular cause caught our attention: recurring synchronous calls to fetch the current site’s cookies from the network service.

Let’s dive into some history.

Cookies under an evolving web

Cookies have been part of the web platform since the very beginning. They are commonly created like this:

    document.cookie = "user=Alice;color=blue"

And later retrieved like this:

    // Assuming a `getCookie` helper method:
    getCookie("user", document.cookie)

Its implementation was simple in single-process browsers, which kept the cookie jar in memory.

Over time, browsers became multi-process, and the process hosting the cookie jar became responsible for answering more and more queries. Because the Web Spec requires Javascript to fetch cookies synchronously, however, answering each document.cookie query is a blocking operation.

The operation itself is very fast, so this approach was generally fine, but under heavy load scenarios where multiple websites are requesting cookies (and other resources) from the network service, the queue of requests could get backed up.

We discovered through field traces of slow interactions that some websites were triggering inefficient scenarios with cookies being fetched multiple times in a row. We landed additional metrics to measure how often a GetCookieString() IPC was redundant (same value returned as last time) across all navigations. We were astonished to discover that 87% of cookie accesses were redundant and that, in some cases, this could happen hundreds of times per second.

The simple design of document.cookie was backfiring as JavaScript on the web was using it like a local value when it was really a remote lookup. Was this a classic computer science case of caching?! Not so fast!

The web spec allows collaborating domains to modify each other’s cookies. Hence, a simple cache per renderer process didn’t work, as it would have prevented writes from propagating between such sites (causing stale cookies and, for example, unsynchronized carts in ecommerce applications).

A new paradigm: Shared Memory Versioning

We solved this with a new paradigm which we called Shared Memory Versioning. The idea is that each value of document.cookie is now paired with a monotonically increasing version. Each renderer caches its last read of document.cookie alongside that version. The network service hosts the version of each document.cookie in shared memory. Renderers can thus tell whether they have the latest version without having to send an inter-process query to the network service.



This reduced cookie-related inter-process messages by 80% and made document.cookie accesses 60% faster 🥳.

Hypothesis testing

Improving an algorithm is nice, but what we ultimately care about is whether that improvement results in improving slow interactions for users. In other words, we need to test the hypothesis that stalled cookie queries were a significant cause of slow interactions.

To achieve this, we used Chrome’s A/B testing framework to study the effect and determined that it, combined with other improvements to reduce resource contention, improved the slowest interactions by approximately 5% on all platforms. This further resulted in more websites passing Core Web Vitals 🥳. All of this adds up to a more seamless web for users.



Timeline of the weighted average of the slowest interactions across the web on Chrome as this was released to 1% (Nov), 50% (Dec), and then all users (Feb).

Onward to a seamless web!

By Gabriel Charette, Olivier Li Shing Tat-Dupuis, Carlos Caballero Grolimund, and François Doray, from the Chrome engineering team



Today’s The Fast and the Curious post covers the release of Speedometer 3.0 an upgraded browser benchmarking tool to optimize the performance of Web applications.

In collaboration with major web browser engines, Blink/V8, Gecko/SpiderMonkey, and WebKit/JavaScriptCore, we’re excited to release Speedometer 3.0. Benchmarks, like Speedometer, are tools that can help browser vendors find opportunities to improve performance. Ideally, they simulate functionality that users encounter on typical websites, to ensure browsers can optimize areas that are beneficial to users.

Let’s dig into the new changes in Speedometer 3.0.

Applying a multi-stakeholder governance model

Since its initial release in 2014 by the WebKit team, browser vendors have successfully used Speedometer to optimize their engines and improve user experiences on the web. Speedometer 2.0, a result of a collaboration between Apple and Chrome, followed in 2018, and it included an updated set of workloads that were more representative of the modern web at that time.

The web has changed a lot since 2018, and so has Speedometer in its latest release, Speedometer 3. This work has been based on a joint multi-stakeholder governance model to share work, and build a collaborative understanding of performance on the web to help drive browser performance in ways that help users. The goal of this collaborative project is to create a shared understanding of web performance so that improvements can be made to enhance the user experience. Together, we were able to to improve how Speedometer captures and calculates scores, show more detailed results and introduce an even wider variety of workloads. This cross-browser collaboration introduced more diverse perspectives that enabled clearer insights into a broader set of web users and workflows, ensuring the newest version of Speedometer will help make the web better for everyone, regardless of which browser they use.

Why is building workloads challenging?

Building a reliable benchmark with representative tests and workloads is challenging enough. That task becomes even more challenging if it will be used as a tool to guide optimization of browser engines over multiple years. To develop the Speedometer 3 benchmark, the Chrome Aurora team, together with colleagues from other participating browser vendors, were tasked with finding new workloads that accurately reflect what users experience across the vast, diverse and eclectic web of 2024 and beyond.

A few tests and workloads can’t simulate the entire web, but while building Speedometer 3 we have established some criteria for selecting ones that are critical to user’s experience. We are now closer to a representative benchmark than ever before. Let’s take a look at how Speedometer workloads evolved

How did the workloads change?

Since the goal is to use workloads that are representative of the web today, we needed to take a look at the previous workloads used in Speedometer and determine what changes were necessary. We needed to decide which frameworks are still relevant, which apps needed updating and what types of work we didn’t capture in previous versions. In Speedometer 2, all workloads were variations of a todo app implemented in different JS frameworks. We found that, as the web evolved over the past six years, we missed out on various JavaScript and Browser APIs that became popular, and apps tend to be much larger and more complicated than before. As a result, we made changes to the list of frameworks we included and we added a wider variety of workloads that cover a broader range of APIs and features.

Frameworks

To determine which frameworks to include, we used data from HTTP Archive and discussed inclusion with all browser vendors to ensure we cover a good range of implementations. For the initial evaluation, we took a snapshot of the HTTP Archive from March 2023 to determine the top JavaScript UI frameworks currently used to build complex web apps.



Another approach is to determine inclusion based on popularity with developers: Do we need to include frameworks that have “momentum”, where a framework's current usage in production might be low, but we anticipate growth in adoption? This is somewhat hard to determine and might not be the ideal sole indicator for inclusion. One data point to evaluate momentum might be monthly NPM downloads of frameworks.

Here are the same 15 frameworks NPM downloads for March 2023:



With both data points on hand, we decided on a list that we felt gives us a good representation of frameworks. We kept the list small to allow space for brand new types of workloads, instead of just todo apps. We also selected commonly used versions for each framework, based on the current usage.



In addition, we updated the previous JavaScript implementations and included a new web-component based version, implemented with vanilla JavaScript.

More Workloads

A simple Todo-list only tests a subset of functionality. For example: how well do browsers handle complicated flexbox and grid layouts? How can we capture SVG and canvas rendering and how can we include more realistic scenarios that happen on a website?

We collected and categorized areas of interest into DOM, layout, API and patterns, to be able to match them to potential workloads that would allow us to test these areas. In addition we collected user journeys that included the different categories of interest: editing text, rendering charts, navigating a site, and so on.



There are many more areas that we weren’t able to include, but the final list of workloads presents a larger variety and we hope that future versions of Speedometer will build upon the current list.

Validation

The Chrome Aurora team worked with the Chrome V8 team to validate our assumptions above. In Chrome, we can use runtime-call-stats to measure time spent in each web API (and additionally many internal components). This allows us to get an insight into how dominant certain APIs are.

If we look at Speedometer 2.1 we see that a disproportionate amount of benchmark time is spent in innerHTML.



While innerHTML is an important web API, it's overrepresented in Speedometer 2.1. Doing the same analysis on the new version 3.0 yields a slightly different picture:



We can see that innerHTML is still present, but its overall contribution shrunk from roughly 14% down to 4.5%. As a result, we get a better distribution that favors more DOM APIs to be optimized. We can also see that a few Canvas APIs have moved into this list, thanks to the new workloads in v3.0.

While we will never be able to perfectly represent the whole web in a fast-running and stable benchmark, it is clear that Speedometer 3.0 is a giant step in the right direction.

Ultimately, we ended up with the following list of workloads presented in the next few sections.

What workloads are included?

TodoMVC

Many developers might recognize the TodoMVC app. It’s a popular resource for learning and offers a wide range of TodoMVC implementations with different frameworks.



TodoMVC is a to-do application that allows a user to keep track of tasks. The user can enter a new task, update an existing one, mark a task as completed, or delete it. In addition to the basic CRUD operations, the TodoMVC app has some added functionality: filters are available to change the view to “all”, “active” or “completed” tasks and a status text displays the number of active tasks to complete.

In Speedometer, we introduced a local data source for todo items, which we use in our tests to populate the todo apps. This gave us the opportunity to test a larger character set with different languages.

The tests for these apps are all similar and are relatable to typical user journeys with a todo app:

  1. Add a task
  2. Mark task as complete
  3. Delete task
  4. Repeat steps 1-3 a set amount of times

These tests seem simple, but it lets us benchmark DOM manipulations. Having a variety of framework implementations also cover several different ways how this can be done.

Complex DOM / TodoMVC

The complex DOM workloads embed various TodoMVC implementations in a static UI shell that mimics a complex web page. The idea is to capture the performance impact on executing seemingly isolated actions (e.g. adding/deleting todo items) in the context of a complex website. Small performance hits that aren’t obvious in an isolated TodoMVC workload are amplified in a larger application and therefore capture more real-world impact.

The tests are similar to the TodoMVC tests, executed in the complex DOM & CSSOM environment.

This introduces an additional layer of complexity that browsers have to be able to handle effortlessly.



Single-page-applications (News Site)

Single-page-applications (SPAs) are widely used on the web for streaming, gaming, social media and pretty much anything you can imagine. A SPA lets us capture navigating between pages and interacting with an app. We chose a news site to represent a SPA, since it allows us to capture the main areas of interest in a deterministic way. An important factor was that we want to ensure we are using static local data and that the app doesn’t rely on network requests to present this data to the user.

Two implementations are included: one built with Next.js and the other with Nuxt. This gave us the opportunity to represent applications built with meta frameworks, with the caveat that we needed to ensure to use static outputs.



Tests for the news site mimic a typical user journey, by selecting a menu item and navigating to another section of the site.

  1. Click on ‘More’ toggle of the navigation
  2. Click on a navigation button
  3. Repeat steps 1 and 2 a set amount of times

These tests let us evaluate how well a browser can handle large DOM and CSSOM changes, by changing a large amount of data that needs to be displayed when navigating to a different page.

Charting Apps & Dashboards

Charting apps allow us to test SVG and canvas rendering by displaying charts in various workloads.

These apps represent popular sites that display financial information, stock charts or dashboards.

Both SVG rendering and the use of the canvas api weren’t represented in previous releases of Speedometer.

Observable Plot displays a stacked bar chart, as well as a dotted chart. It is based on D3, which is a JavaScript library for visualizing tabular data and outputs SVG elements. It loops through a big dataset to build the source data that D3 needs, using map, filter and flatMap methods. As a result this exercises creation and copying of objects and arrays.

Chart.js is a JavaScript charting library. The included workload displays a scatter graph with the canvas api, both with some transparency and with full opacity. This uses the same data as the previous workload, but with a different preparation phase. In this case it makes a heavy use of trigonometry to compute distances between airports.

React Stockcharts displays a dashboard for stocks. It is based on D3 for all computation, but outputs SVG directly using React.

Webkit Perf-Dashboard is an application used to track various performance metrics of WebKit. The dashboard uses canvas drawing and web components for its ui.

These workloads test DOM manipulation with SVG or canvas by interacting with charts. For example here are the interactions of the Observable Plot workload:

  1. Prepare data: compute the input datasets to output structures that D3 understands.
  2. Add stacked chart: this draws a chart using SVG elements.
  3. Change input slider to change the computation parameters.
  4. Repeat steps 1 and 2
  5. Reset: this clears the view
  6. Add dotted chart: this draws another type of graph (dots instead of bars) to exercise different drawing primitives. This also uses a power scale.



Code Editors

Editors, for example WYSIWYG text and code editors, let us focus on editing live text and capturing form interactions. Typical scenarios are writing an email, logging into a website or filling out an online form. Although there is some form interaction present in the TodoMVC apps, the editor workloads use a large data set, which lets us evaluate performance more accurately.



Codemirror is a code editor that implements a text input field with support for many editing features. Several languages and frameworks are available and for this workload we used the JavaScript library from Codemirror.

Tiptap Editor is a headless, framework-agnostic rich text editor that's customizable and extendable. This workload used Tiptap as its basis and added a simple ui to interact with.

Both apps test DOM insertion and manipulation of a large amount of data in the following way:

  1. Create an editable element.
  2. Insert a long text.: Codemirror uses the development bundle of React, whileTipTap loads an excerpt of Proust’s Du Côté de Chez Swann.
  3. Highlight text: Codemirror turns on syntax highlighting, while TipTap sets all the text to bold.
Parting words

Being able to collaborate with all major browser vendors and having all of us contribute to workloads has been a unique experience and we are looking forward to continuing to collaborate in the browser benchmarking space.

Don’t forget to check out the new release of Speedometer and test it out in your favorite browser, dig into the results, check out our repo and feel free to open issues with any improvements or ideas for workloads you would like to see included in the next version. We are aiming for a more frequent release schedule in the future and if you are a framework author and want to contribute, feel free to file an issue on our Github to start the discussion.

Posted by Thorsten Kober, Chrome Aurora



Today’s The Fast and the Curious post explores how Core Web Vitals saved Chrome users more than 10,000 Years of waiting for web pages to load in 2023 (across Chrome desktop and Android) by quantifying the experience of sites and identifying opportunities to make improvements.



In 2020, we introduced Web Vitals - essential quality signals for webpages to ensure a better user experience. Since then, there has been a massive leap in web performance made possible by our work on Core Web Vitals (CWV) and its broader impact on the web. Today, over 40% of sites pass all of the CWV metrics, leading to pages that load and respond to interactions more quickly. Here’s a closer look at the journey to help improve the performance for sites and some specific work done in the browser and the ecosystem to enable this achievement. 



Chrome's Quest for Speed


The very essence of the web lies in its ability to provide information and services efficiently and rapidly. This principle is at the heart of Google's business and drives our work on Chrome. However, we noticed an issue with sites over a long time horizon. Even if slow sites improved their performance for a while, it would often decline over time. No matter how fast Google Search might be, the user experience would be subpar if the pages found were slow to load.


We could not help these sites improve their performance directly, but we wanted users to have a great experience when they moved from Google Search to the individual sites. To tackle the challenge of improving the user experience while simultaneously providing unified guidance to developers, teams from Search and Chrome collaborated to address the issue of slow web pages.



Defining the Fast Web 


We examined millions of pages to define a public standard for a fast, user-friendly web page (initially published in The Science Behind Web Vitals). We published our specifications and data to the open ecosystem and took note of  the feedback we received. The introduction of CWV metrics such as LCP (Largest Contentful Paint) was groundbreaking because it allowed us to measure when the user actually sees the content. The ability to measure the actual user experience at scale has been foundational to the improvements that we will discuss in this blog post.


Next, we updated Google's search ranking algorithms in August 2021 to consider, among other factors, whether a page met the speed and usability standards established as part of CWV.  Today, it remains highly recommended for site owners to achieve good Core Web Vitals for success with Search and to ensure a great user experience generally.



Exponential Impact of Small Changes


The results we saw after these changes were significant. The average page load in Chrome is now 166 ms faster. That might seem like a minor improvement, but small changes can accumulate to create a substantial impact on the web. 


So far in 2023, this project saved users over 10,000 years of waiting for web pages to load and over 1,200 years of waiting for web pages to respond to user input. And the web continues to get faster. We also tracked improvements in how many navigations meet Core Web Vitals (CWV). The current figures stand at 64.45% for mobile (up from 64%) and 68.39% for desktop (up from 67%). The Chrome Data team projects a ~69% pass rate by the end of the year.


Caption: Our savings for LCP translate into 8,000 years saved for users waiting for pages to load on Android and 2,000 years in 2023 so far. On INP, we have saved users 800 years on Android and 450 years on Windows so far in 2023.


Next, let’s look at some recent updates from both the Chrome team and the wider developer ecosystem, demonstrating how our joint efforts are speeding up the web.



Chrome’s Core Web Vitals Achievements


We’re proud to highlight numerous ways we’ve optimized performance. 


  • The Back/forward cache (bfcache) is designed to improve browsing experience by enabling instant back and forward navigation. BFCache’s hit rate has improved month-over-month on both Android (3.6%) and Desktop (1.8%).


  • Another example of a particularly impactful optimization is our PreconnectOnAnchorInteraction feature which connects to origins on pointer-down rather than pointer-up. This fully launched feature led to a 6/10ms (0.4/1%) median LCP improvement on Android/Desktop, and an improvement in cross-origin LCP by ~60ms on both Android and Desktop. The launch also resulted in a 0.08% Content Ad revenue increase, underlining the significant impact of performance optimizations on user engagement and ecosystem health.


  • We also introduced prerendering, which makes pages load instantly by rendering them before the user actually visits. Page loads via typing URLs directly in the omnibox get a 500-700ms (14-25%) median LCP improvement when prerendered, depending on the platform, moving global median LCP across all navigations by 6.4ms. We're currently rolling out prerendering of omnibox-initiated searches.


  • Chrome has been working hard to keep background tabs out of your way. Implementing tab throttling for background tabs running at EcoQOS on Windows 11 and Task Role and QoS Adjustments on macOS have led to improvements in Largest Contentful Paint (LCP) and Interaction to Next Paint (INP).

  • The web’s modern ability to run all types of applications also comes with a mandate to manage the workload that this encurs. We have been optimizing Chrome under mutliple active tabs  and are happy to report improvements to scheduling and contention which improve INP by 5% and LCP by 2% in the last 6 months.


  • We have made targeted improvements to the page loading code in Chrome in 2022. These resulted in LCP improving by 10% on Android, and CWV pass rate improving by 1.5%.


  • Chrome's renderer has also seen some improvements. The renderer's main thread includes task queues for JavaScript, rendering, and image loading. Some changes that alter the priority of these tasks for optimal CWV include.

    • High priority image loading: Historically, image-loading had the same or lower priority than rendering. However, an experiment showed that between an image load task and a rendering task, choosing the image load task first can prevent layout shift of an intermediate frame that doesn't have the image and also improves LCP. The improvement on Android at the 75th percentile was -6.66% for CLS and -0.82% for LCP, improving the CWV pass rate on Android by +0.24%. A similar experiment that boosted the loading priority to "medium" of the first five images parsed from the HTML (for non-icon-sized images) showed an improvement on Android at the 75th percentile of -6.08% for CLS and -0.53% for LCP. A combined experiment showed the effects of both changes were largely independent. 

    • Prioritize compositing after delay: If it has been more than 100ms since the last compositing task run, elevate the priority of any queued compositing task so that it will preempt normal-priority work. This produced an improvement of -0.27% for CLS on Android and Windows at the 95th percentile.

    • SVG Raster Optimizations: Another SVG drawing optimization improved INP pass rates on desktops by -2.28% for MacOS at the 75th percentile. 



Caption: An example of Chrome’s new prioritized loading of the first five images parsed from the HTML. This improved LCP from 3.1s to 2.5s.


Ecosystem Core Web Vitals Achievements


The broader developer ecosystem has also achieved remarkable results by focusing on Core Web Vitals. The most significant achievement was the performance improvement on WordPress - the Content Management System that powers over a third of the web: "WordPress 6.3 loads 27% faster for block themes and 18% faster for classic themes, compared to WordPress 6.2, based on the Largest Contentful Paint (LCP) metric". 


Some parts of the WordPress ecosystem are going even further. Prerendering some links via the speculation rules API, NitroPack's prerendered page loads have seen an 80% LCP improvement and 55% INP improvement compared to those without any speculative loading.



Caption: The percentage of origins passing all three Core Web Vitals (LCP, FID, CLS) with a "good" experience (Source: HTTP Archive)


The JavaScript framework community has also seen Core Web Vital gains. Over the past few years, Chrome Aurora has collaborated with Next.js, Angular, and Nuxt to release performance-focused features like the next/script component, NgOptimizedImage, and nuxt/google-fonts. In 2022, Next.js pass rates increased from 20.4% to 27.3%, Angular pass rates increased from 7.6% to 13.2%, and Nuxt pass rates increased from 15.8% to 20.2%. Enterprise partners who tried our features have seen wins in LCP. For example, after switching to NgOptimizedImage, Land's End saw a 40% LCP improvement on mobile in Lighthouse lab tests and a 75% improvement in LCP on desktop. In similar tests, CareerKarma's LCP reduced 24% when switching to next/script's web worker mode. 


In the business world, performance optimization has led to remarkable growth. For instance, RedBus improved INP and observed a 7% increase in conversion rates. Economic Times improved INP and saw a 42% rise in page views and a 49% reduction in bounce rate. Meesho successfully brought LCP down from 6.9s to 2.5s, resulting in a 16.6% reduction in bounce rate and a 3% increase in conversions.


Major web platforms have also seen significant improvements. Amazon has leveraged the bfcache change introduced on Chrome and saw a 22.7 percentage point (pp) improvement in bfcache hit rate with Chrome's latest version (M112). Cricbuzz experienced an even higher increase, with a 31.40 pp improvement.



Partnering for a Better Web 


These performance improvements aren't just statistics – they represent real-world improvements in user experience (and hence business metrics) as well as developer experience.

 

Crucially, we have managed to achieve these speed boosts without impacting developer satisfaction, which remains high at 90% overall. Through our developer satisfaction studies, we also found that about half (~51%) of developers are monitoring CWV and are either already optimizing for them or planning to do so. Furthermore, a significant majority (78%) of developers optimizing for CWV report seeing notable improvements in their scores.


Our aim is always to create a better web experience for all users, so we're excited to see the web getting faster. But we also understand that maintaining developer satisfaction is crucial to sustaining these improvements. As developers continue to monitor and optimize for CWV, we are optimistic about the future of web performance.


On behalf of the Chrome team, we want to thank the developer community for their incredible work. By focusing on Core Web Vitals, we've made the web a significantly faster and more enjoyable place to be. We look forward to continuing this journey together, making the web better for everyone, everywhere.


Posted by Addy Osmani, Annie Sullivan and Kouhei Ueno, Software Engineers for Chrome