[go: up one dir, main page]

TraceMonkey Update

We have been busy, mostly fixing bugs for stability, but also winning a bit more performance, since TraceMonkey landed on mozilla-central, from which Firefox 3.1 alpha-stage nightly builds are built. Tonight’s builds include a fix for the bug that ilooped a SunSpider test (my apologies to those of you who suffered that bug’s bite).

But what I’m sure everyone wants to know is: how do we compare to V8?

Here are the results from head-to-head SunSpider on Windows XP on a Mac Mini and Windows Vista on a MacBook Pro, testing against last night’s Firefox automated build and yesterday’s Chrome beta:

tracemonkeyv8

We win by 1.28x and 1.19x, respectively. Maybe we should rename TraceMonkey “V10” ;-).

Ok, it’s only SunSpider, one popular yet arguably non-representative benchmark suite. We are not about to be braggy. (“Don’t be braggy” is our motto here at Mozilla ;-).)

But it’s worth digging deeper into the results. Let’s look at the ratios by test:

tmfaster

We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.

This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven’t had enough hours in the day to get to it, but it’s “next”.

This reminds me: TraceMonkey is only a few months old, excluding the Tamarin Tracing Nanojit contributed by Adobe (thanks again, Ed and co.!), which we’ve built on and enhanced with x86-64 support and other fixes. We’ve developed TraceMonkey in the open the whole way. And we’re as fast as V8 on SunSpider!

This is not a trivial feat. As we continue to trace unrecorded bytecode and operand combinations, we will only get faster. As we add recursion, trace-wise register allocation, and other optimizations, we will eliminate the losses shown above and improve our ratios linearly across the board, probably by 2 or greater.

I’ll keep updating the blog every week, as we do this work. Your comments are welcome as always.

V8 is great work, very well-engineered, with room to speed up too. (And Chrome looks good to great — the multi-process architecture is righteous, but you expected no less praise from an old Unix hacker like me.)

What spectators have to realize is that this contest is not a playoff where each contending VM is eliminated at any given hype-event point. We believe that Franz&Gal-style tracing has more “headroom” than less aggressively speculative approaches, due to its ability to specialize code, making variables constant and eliminating dead code and conditions at runtime, based on the latent types inherent in almost all JavaScript programs. If we are right, we’ll find out over the next weeks and months, and so will you all.

Anyway, we’re very much in the game and moving fast — “reports of our death are greatly exaggerated.” Stay tuned!

TraceMonkey: JavaScript Lightspeed

I’m extremely pleased to announce the launch of TraceMonkey, an evolution of Firefox’s SpiderMonkey JavaScript engine for Firefox 3.1 that uses a new kind of Just-In-Time (JIT) compiler to boost JS performance by an order of magnitude or more.

Results

Let’s cut straight to the charts. Here are the popular SunSpider macro- and micro-benchmarks average scores, plus results for an image manipulation benchmark and a test using the Sylvester 3D JS library’s matrix multiplication methods:

asbenchlg

Here are some select SunSpider micro-benchmarks, to show some near-term upper bounds on performance:

ff31wtracingsm

This chart shows speedup ratios over the SpiderMonkey interpreter, which is why “empty loop with globals” (a loop using global loop control and accumulator variables) shows a greater speedup — global variables in JavaScript, especially if undeclared by var, can be harder to optimize in an interpreter than local variables in a function.

Here are the fastest test-by-test SunSpider results, sorted from greatest speedup to least:

ff31vsff3

The lesser speedups need their own chart, or they would be dwarfed by the above results:

ff31wtracingsm

(Any slowdown is a bug we will fix; we’re in hot pursuit of the one biting binary-trees, which is heavily recursive — it will be fixed.)

With SunSpider, some of the longest-running tests are string and regular-expression monsters, and since like most JS engines, we use native (compiled C++) code for most of the work, there’s not as much speedup. Amdahl’s Law predicts that this will bound the weighted-average total Sunspider score, probably to around 2. No matter how fast we JIT the rest of the code, the total score will be . . . 2.

But this is only a start. With tracing, performance will keep going up. We have easy small linear speedup tasks remaining (better register allocation, spill reduction around built-in calls). We will trace string and regular expression code and break through the “2” barrier. We will even trace into DOM methods. The tracing JIT approach scales as you move more code into JS, or otherwise into view of the tracing machinery.

Finally, schrep created a screencast that visually demonstrates the speedup gained by TraceMonkey. These speedups are not just for micro-benchmarks. You can see and feel them.

How We Did It

We’ve been working with Andreas Gal of UC Irvine on TraceMonkey, and it has been a blast. We started a little over sixty days (and nights 😉 ago, and just yesterday, shaver pushed the results of our work into the mozilla-central Hg repository for inclusion in Firefox 3.1.

The JIT is currently pref’ed off, but you can enable it via about:config — just search for “jit” and, if you are willing to report any bugs you find, toggle the javascript.options.jit.content preference (there’s a jit.chrome pref too, for the truly adventurous).

Before TraceMonkey, for Firefox 3, we made serious performance improvements toSpiderMonkey, both to its Array code and to its interpreter. The interpreter speedups entailed two major pieces of work:

  • Making bytecode cases in the threaded interpreter even fatter, so the fast cases can stay in the interpreter function.
  • Adding a polymorphic property cache, for addressing properties found in prototype and scope objects quickly, without having to look in each object along the chain.

I will talk about the property cache and the “shape inference” it is based on in another post.

By the way, we are not letting moss grow under our interpreter’s feet. Dave Mandelin is working on a combination of inline-threading and call-threading that will take interpreter performance up another notch.

While doing this Firefox 3 work, I was reminded again of the adage:

Neurosis is doing the same thing over and over again, expecting to get a different result each time.

But this is exactly what dynamically typed language interpreters must do. Consider the + operator:

a = b + c;

Is this string concatenation, or number addition? Without static analysis (generally too costly), we can’t know ahead of time. For SpiderMonkey, we have to ask further: if number, can we keep the operands and result in machine integers of some kind?

Any interpreter will have to cope with unlikely (but allowed) overflow from int to double precision binary floating point, or even change of variable type from number to string. But this is neurotic, because for the vast majority of JS code, in spite of the freedom to mutate type of variable, types are stable. (This stability holds for other dynamic languages including Python.)

Another insight, which is key to the tracing JIT approach: if you are spending much time in JS, you are probably looping. There’s simply not enough straight line code in Firefox’s JS, or in a web app, to take that much runtime. Native code may go out to lunch, of course, but if you are spending time in JS, you’re either looping or doing recursion.

The Trace Trees approach to tracing JIT compilation that Andreas pioneered can handle loops and recursion. Everything starts in the interpreter, when TraceMonkey notices a hot loop by keeping cheap count of how often a particular backward jump (or any backward jump) has happened.

for (var i = 0; i < BIG; i++) {
    // Loop header starts here:
    if (usuallyTrue())
        commonPath();
    else
        uncommonPath();
}

Once a hot loop has been detected, TraceMonkey starts recording a trace. We use the Tamarin Tracing Nanojit to generate low-level intermediate representation instructions specialized from the SpiderMonkey bytecodes, their immediate and incoming stack operands, and the property cache “hit” case fast-lookup information.

The trace recorder completes when the loop header (see the comment in the code above) is reached by a backward jump. If the trace does not complete this way, the recorder aborts and the interpreter resumes without recording traces.

Let’s suppose the usuallyTrue() function returns true (it could return any truthy, e.g. 1 or "non-empty" — we can cope). The trace recorder emits a special guard instruction to check that the truthy condition matches, allowing native machine-code trace execution to continue if so. If the condition does not match, the guard exits (so-called “side-exits”) the trace, returning to the interpreter at the exact point in the bytecode where the guard was recorded, with all the necessary interpreter state restored.

If the interpreter sees usuallyTrue() return true, then the commonPath(); case will be traced. After that function has been traced comes the loop update part i++ (which might or might not stay in SpiderMonkey’s integer representation depending on the value of BIG — again we guard). Finally, the condition i < BIG will be recorded as a guard.

// Loop header starts here:
inlined usuallyTrue() call, with guards
guard on truthy return value
guard that the function being invoked at this point is commonPath
inlined commonPath() call, with any calls it makes inlined, guarded
i++ code, with overflow to double guard
i < BIG condition and loop-edge guard
jump back to loop header

Thus tracing is all about speculating that what the interpreter sees is what will happen next time — that the virtual machine can stop being neurotic. And as you can see, tracing JITs can inline method calls easily — just record the interpreter as it follows a JSOP_CALL instruction into an interpreted function.

One point about Trace Trees (as opposed to less structured kinds of tracing): you get function inlining without having to build interpreter frames at all, because the trace recording must reach the loop header in the outer function in order to complete. Therefore, so long as the JITted code stays “on trace”, no interpreter frames need to be built.

If the commonPath function itself contains a guard that side-exits at runtime, then (and only then) will one or more interpreter frames need to be reconstructed.

Let’s say after some number of iterations, the loop shown above side-exits at the guard for usuallyTrue() because that function returns a falsy value. We abort correctly back to the interpreter, but keep recording in case we can complete another trace back to the same loop header, and extend the first into a trace tree. This allows us to handle different paths through the control flow graph (including inlined functions) under a hot loop.

What It All Means

Pulling back from the details, a few points deserve to be called out:

  • We have, right now, x86, x86-64, and ARM support in TraceMonkey. This means we are ready for mobile and desktop target platforms out of the box.
  • As the performance keeps going up, people will write and transport code that was “too slow” to run in the browser as JS. This means the web can accommodate workloads that right now require a proprietary plugin.
  • As we trace more of the DOM and our other native code, we increase the memory-safe codebase that must be trusted not to have an exploitable bug.
  • Tracing follows only the hot paths, and builds a trace-tree cache. Cold code never gets traced or JITted, avoiding the memory bloat that whole-method JITs incur. Tracing is mobile-friendly.
  • JS-driven <canvas> rendering, with toolkits, scene graphs, game logic, etc. all in JS, are one wave of the future that is about to crest.

TraceMonkey advances us toward the Mozilla
2
future where even more Firefox code is written in JS. Firefox gets faster and safer as this process unfolds.

I believe that other browsers will follow our lead and take JS performance through current interpreter speed barriers, using just-in-time native code compilation. Beyond what TraceMonkey means for Firefox and other Mozilla projects, it heralds the JavaScript Lightspeed future we’ve all been anticipating. We are moving the goal posts and changing the game, for the benefit of all web developers.

Acknowledgments

I would like to thank Michael Franz and the rest of his group at UC Irvine, especially Michael Bebenita, Mason Chang, and Gregor Wagner; also the National Science Foundation for supporting Andreas Gal’s thesis. I’m also grateful to Ed Smith and the Tamarin Tracing team at Adobe for the TT Nanojit, which was a huge boost to developing TraceMonkey.

And of course, mad props and late night thanks to Team TraceMonkey: Andreas, Shaver, David Anderson, with valuable assists from Bob Clary, Rob Sayre, Blake Kaplan, Boris Zbarsky, and Vladimir Vukićević.

Popularity

It seems (according to one guru, but coming from this source, it’s a left-handed compliment) that JavaScript is finally popular.

To me, a nerd from a tender age, this is something between a curse and a joke. (See if you are in my camp: isn’t the green chick hotter?)

Brendan Eich convinced his pointy-haired boss at Netscape that the Navigator browser should have its own scripting language, and that only a new language would do, a new language designed and implemented in big hurry, and that no existing language should be considered for that role.

I don’t know why Doug is making up stories. He wasn’t at Netscape. He has heard my recollections about JavaScript’s birth directly, told in my keynotes at Ajax conferences. Revisionist shenanigans to advance a Microhoo C# agenda among Web developers?

Who knows, and it’s hard to care, but in this week of the tenth anniversary of mozilla.org, a project I co-founded, I mean to tell some history.

As I’ve often said, and as others at Netscape can confirm, I was recruited to Netscape with the promise of “doing Scheme” in the browser. At least client engineering management including Tom Paquin, Michael Toy, and Rick Schell, along with some guy named Marc Andreessen, were convinced that Netscape should embed a programming language, in source form, in HTML. So it was hardly a case of me selling a “pointy-haired boss” — more the reverse.

Whether that language should be Scheme was an open question, but Scheme was the bait I went for in joining Netscape. Previously, at SGI, Nick Thompson had turned me on to SICP.

What was needed was a convincing proof of concept, AKA a demo. That, I delivered, and in too-short order it was a fait accompli.

Of course, by the time I joined Netscape, and then transferred out of the server group where I had been hired based on short-term requisition scarcity games (and where I had the pleasure of working briefly with the McCool twins and Ari Luotonen; later in 1995, Ari and I would create PAC), the Oak language had been renamed Java, and Netscape was negotiating with Sun to include it in Navigator.

The big debate inside Netscape therefore became “why two languages? why not just Java?” The answer was that two languages were required to serve the two mostly-disjoint audiences in the programming ziggurat who most deserved dedicated programming languages: the component authors, who wrote in C++ or (we hoped) Java; and the “scripters”, amateur or pro, who would write code directly embedded in HTML.

Whether any existing language could be used, instead of inventing a new one, was also not something I decided. The diktat from upper engineering management was that the language must “look like Java”. That ruled out Perl, Python, and Tcl, along with Scheme. Later, in 1996, John Ousterhout came by to pitch Tk and lament the missed opportunity for Tcl.

I’m not proud, but I’m happy that I chose Scheme-ish first-class functions and Self-ish (albeit singular) prototypes as the main ingredients. The Java influences, especially y2k Date bugs but also the primitive vs. object distinction (e.g., string vs. String), were unfortunate.

Back to spring of 1995: I remember meeting Bill Joy during this period, and discussing fine points of garbage collection (card marking for efficient write barriers) with him. From the beginning, Bill grokked the idea of an easy-to-use “scripting language” as a companion to Java, analogous to VB‘s relationship to C++ in Microsoft’s platform of the mid-nineties. He was, as far as I can tell, our champion at Sun.

Kipp Hickman and I had been studying Java in April and May 1995, and Kipp had started writing his own JVM. Kipp and I wrote the first version of NSPR as a portability layer underlying his JVM, and I used it for the same purpose when prototyping “Mocha” in early-to-mid-May.

Bill convinced us to drop Kipp’s JVM because it would lack bug-for-bug compatibility with Sun’s JVM (a wise observation in those early days). By this point “Mocha” had proven itself via rapid prototyping and embedding in Netscape Navigator 2.0 , which was in its pre-alpha development phase.

The rest is perverse, merciless history. JS beat Java on the client, rivaled only by Flash, which supports an offspring of JS, ActionScript.

So back to popularity. I can take it or leave it. Nevertheless, popular Ajax libraries, often crunched and minified and link-culled into different plaintext source forms, are schlepped around the Internet constantly. Can we not share?

One idea, mooted by many folks, most recently here by Doug, entails embedding crypto-hashes in potentially very long-lived script tag attributes. Is this a good idea?

Probably not, based both on theoretical soundness concerns about crypto-hash algorithms, and on well-known poisoning attacks.

A better idea, which I heard first from Rob Sayre: support an optional “canonical URL” for the script source, via a share attribute on HTML5 <script>:

<mce:script mce_src=”https://my.edge.cached.startup.com/dojo-1.0.0.js” shared=”https://o.aolcdn.com/dojo/1.0.0/dojo/dojo.xd.js”>
</mce:script><br />

If the browser has already downloaded the shared URL, and it still is valid according to HTTP caching rules, then it can use the cached (and pre-compiled!) script instead of downloading the src URL.

This avoids hash poisoning concerns. It requires only that the content author ensure that the src attribute name a file identical to the canonical (“popular”) version of the library named by the shared attribute. And of course, it requires that we trust the DNS. (Ulp.)

This scheme also avoids embedding inscrutable hashcodes in script tag attribute values.

Your comments are welcome.

Ok, back to JavaScript popularity. We know certain Ajax libraries are popular. Is JavaScript popular? It’s hard to say. Some Ajax developers profess (and demonstrate) love for it. Yet many curse it, including me. I still think of it as a quickie love-child of C and Self. Dr. Johnson‘s words come to mind: “the part that is good is not original, and the part that is original is not good.”

Yet here we are. The web must evolve, or die. So too with JS, wherefore ES4. About which, more anon.

Firefox 3 looks like it will be popular too, based on space and time performance metrics. More on that soon, too.