Benchmarking Mobile Browsers-90% Half Mental

Yogi Berra once opined “Baseball is 90% mental – the other half is physical.” Yogi missed a starring role in performance profiling! Batting average is a good benchmark of performance for a baseball player, but it may be more useful to know how well he bats in night games with runners in scoring position playing on artificial turf while facing a left handed pitcher from Mississippi.

Benchmarking anything is a tricky business, especially when something involves as many system components as a web browser. Consider that comparing a functional component like a CPU has always generated lots of angst amongst competitors, as well as a joke or two (e.g. “MIPS” stands for Meaningless Indicator of Processor Speed), and this has led not only to new benchmarks but also some new metrics such as mW/MHz or MIPS/mW. Think of all the system variability in simply downloading a web page – client platform, browser, network, server, and content. Is the browser cache cleared? Is there any concurrency in the memory and processor system? Is the system state in high performance or power saving mode? Most mobile browser benchmarks I’ve seen at least start with a cleared cache, no concurrency, and operate over a WiFi connection, but there’s still plenty of upstream variability. For example, is the server response for DNS lookups and content fetch consistent? Is the network performance consistent? A reasonable approach to dealing with the system variability is to make multiple “identical” runs (clearing the browser cache, hosting content on an internal server) and averaging the results. And then there is the browser itself. Some systems support multiple browsers, and they can vary in JavaScript performance by an order of magnitude. In fact, browsers may even have a different definition of “done,” which can skew results.

Qualcomm Innovation Center’s (QuIC) Web Technology team has already begun making upstream contributions to WebKit, Skia, and Google’s V8 JavaScript engine. Because those contributions are quite recent, expect to see them in commercial phones later this year. Our near-term primary focus is optimizing the performance of WebKit based mobile browsers. As such, we recently decided to establish a performance baseline as well as a competitive baseline to compare all of our work against. To do this, we compared i-Bench scores for page download performance, SunSpider scores for JavaScript performance, and a few internal benchmarks across a wide range of products. For instance, let’s compare the iPhone3GS with the latest high-tier Android Smartphones; the Motorola Droid and the HTC Nexus One. All three have leading edge software environments with WebKit-based browsers. All three also have application processors using the latest ARM instruction set, with the Nexus One being based on Qualcomm’s Snapdragon platform.

The Nexus One has generated quite a bit of media interest. A recent search in YouTube returned over 3,000 Nexus One-related videos. This Smartphone has generated many positive and not so positive reviews as well. However, nothing caught my attention as much as Engadget’s review of Nexus One, which provided a movie demo of Nexus One, Motorola’s Droid and Apple’s iPhone3GS simultaneously loading Engadget’s home page. This singular example showed the iPhone3GS as having the fastest web page download performance among the three Smartphones, but in our comparisons, we never saw a situation where Nexus One was slower than any other Smartphone. In our page download benchmarks, the Nexus One was 50% faster on average, and over 2x faster in some cases.

Our i-Bench and SunSpider benchmarks for Nexus One, iPhone3GS and Droid are shown below (lower means faster).

  Nexus One iPhone3GS Droid
i-Bench 221 338 537
SunSpider 14.4 16.2 32.4

Furthermore, in our standalone testing of google.com, sandiego.craigslist.org, mobile.washingtonpost.com, amazon.com, and nytimes.com, the Nexus One was fastest in every case. However, Engadget’s web page did indeed demonstrate a case where iPhone3GS usually (but not always) performed faster than Nexus One. We ran several runs, and on average, the iPhone3GS downloaded the page ~1 second faster than Nexus One. My colleagues Rajiv and Zein captured the video below for reference.

thumb

This situation represents a perfect example of content variability and why fair comparisons should represent more than a single download case. Whereas some web pages download in 1-2 seconds, some may take 20 seconds or longer. Even if you download an uncached web page many times in a row, the content on that single page could change due to active content. As I said, benchmarking is a tricky business!

The mobile industry largely recognizes that traditional browser benchmarks such as i-Bench are either long in the tooth or inadequate, are not necessarily suitable to mobile device usage, and do not properly keep up with content, server, network and client platform variations. Fortunately, SunSpider is continually evolving, but that means we need to pay attention to the benchmark version number used when comparing scores. Additionally, there are newer benchmarks such as Google’s V8 JavaScript benchmark which tend to augment SunSpider, and emerging efforts such as Futuremark’s Peacekeeper. (FWIW, I find the name “Peacekeeper” more than a little ironic for a benchmark.) Additionally, Microsoft has published a research paper which describes some new approaches to measuring performance of JavaScript applications called JSMeter. QuIC’s Web Technology team will continue to monitor these emerging benchmarks and improve our internal performance benchmarks and metrics as well. Our goal is simply to provide the best mobile browser end user experience, and we need accurate data to do that.

Comments are closed.