.

Tags:

When analyzing the performance of a JavaScript-based application, stopwatch is often a convenient tool. Just like any other timing measurements in real life, it is important to ensure that this produces a valid and confident result. Thus, we need to avoid some factors which may reduce its accuracy and precision.

Imagine you are running on a track and you have five stopwatches giving wildly varying timing measurements of your performance. In this scenario, it is difficult to put a lot of confidence in the numbers. This is why many JavaScript-related benchmarks often come with a warning that the tested application should be the only one running. The goal is to minimize any random side activities which may cause some variations.

In the wikipedia page on Accuracy and precision, we find:

..the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value.

and also

The precision of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.

In addition, the target analogy is usually quite effective to demonstrate the concept. In the context of JavaScript world, you can also think of a dartboard (after all, no JavaScript discussion is complete until Dart is mentioned).

accuracy_precision

In order to get high-quality benchmark results, it is important to look at several factors: accuracy, errors, running time. This is why if you write tests for JSPerf, each measurement is displayed in ops/sec and relative margin of error. Should you directly use Benchmark.js (which powers JSPerf), this is also easy to retrieved using Benchmark.prototype.stats.rme. This is just one of the crucial considerations to ensure that your benchmark is bulletproof.

Even if we finally obtain a result with a low margin of error, we still need to ensure that it is an accurate one. In a few cases, particularly with microbenchmarks, what is being measured may not reflect the reality. Modern JavaScript engine can perform various optimizations which falsify the measurement (among others) loop-invariant code motion, constant propagation, dead code elimination.

As a quick illustration, consider the following loop you want to time:

for (var i = 0; i < 100; ++i) {
  sum += Math.sqrt(2) * i;
}

If a JavaScript engine (with loop-invariant code motion support) detects that this loop should be optimized, it may see that the Math.sqrt(2) can be computed once and placed outside the loop. In other words, the actual loop looks like as if you would have written it as the following fragment. This may or may not be what you want, hence it is important to carefully review such a loop.

var temp = Math.sqrt(2);
for (var i = 0; i < 100; ++i) {
  sum += temp * i;
}

Dead code elimination is also known to offset timing analysis. Early published results of Internet Explorer 9 performance showed near-instant completion of some SunSpider tests, it turned out that this is attributed to its ability to eliminate dead code.

Next time you throw some benchmark numbers, think carefully about its accuracy and precision!

  • Rupert

    I am a clueless semi-literate russian peasant, and I am confused. How many potato?

  • Rupert

    Я невежественный полуграмотных русский крестьянин, и я смущен. Сколько картофеля?

  • Крестьянин

    проиграл

  • Matthew Kastor

    This is a good article about accuracy and precision but it doesn’t directly state the obvious conclusion: benchmarks measure the performance of a particular “machine” running the code sample, they don’t tell you about the code sample itself. Machine, in this case being the physical hardware, interpreter, and environment. It is the same analogy as using the exact same octane of gasoline in a race car and some beat up junker with no oil, you’ll get different benchmarks for the cars and you can’t directly attribute the benchmarks to the gasoline used.

    I do like the tools you mention though. They’re really useful if you know where your code will be deployed because you can optimize for the specific machines it will be run on. :D