.

Tags:

Profile-guided optimization (PGO) is a known compiler technique to produce an optimized code. The generated code is biased towards the common data set which will be fed to the code. It is definitely possible to apply the same technique to web applications written in JavaScript.

Various compilers, from Microsoft Visual C++ to Intel Fortran, leverage the profile data to make judgements on how to hit the right compromise. For example, small functions which get executed quite often should be in the close proximity so that the locality helps the CPU cache. The execution statistics can also help deciding which functions should be inlined, which branches should be pushed to top, and many other different trade-offs.

A key to a successful profile-guided optimization is a set of representative data. It could be a sample which is obtained during the test run. However, we need to ensure that the data really resembles, to a certain extent, the actual real-world distribution. For example, a JSON parser should be tested with a set of valid (albeit synthetic) stringified objects, and not just a stream of random characters.

Let us take a look at the following example. Supported the JSON parser mentioned above (or any other kind of parsing function) needs to check whether a character represents a decimal digit (0-9) or not. A possible implementation, among a billion possible variants, is as simple as:

function isDigit(ch) {
  return '0123456789'.indexOf(ch) >= 0;
}

datasets
The next question is, how the input data distribution look like? For this discussion, let us assume that apparently the characters fed into this function are mostly whitespaces, letters, and the real digits as depicted in the pie chart.

If 65% of the time, isDigit will receive a space, should we tackle that first? We may save some time since there is no need to check for those 10 different possibilities of the digits if it turns out to be a simple whitespace. With this in mind, we can tweak the implementation to become:

function isDigit(ch) {
  return (ch !== ' ') && '0123456789'.indexOf(ch) >= 0;
}

Thanks to the short-circuit behavior of logical AND expression (see Section 11.11 Binary Logical Operators), everytime ch is a space, the function will bail out immediately. This form the special fast path for the code. If this condition is not fulfilled, we fall back again to the slow path.

Generalizing this approach, a pair of slow vs fast path is illustrated as follows:

shortcut

Given the right profile data, it should not be too difficult to estimate the performance benefit. For the given example isDigit, it likely does not matter much since the function is so simple and it gets executed in a split second. When measuring the time, make sure you pay attention to both its accuracy and precision.

In a general situation where you can avoid the slow path under certain circumstances, the extra performance should be noticeable. For example, if the generic slow path takes 2 ms to process the data and you have 100 items in the data set, the total processing will be 200 ms. Now, assume that the fast path is twice as fast, i.e. 1 ms. According to the given distribution, the fast path is executed only on 65 items of that data set. Also, because the slow path is now in the fall back and not executed immediately, there will be an extra 20% overhead. This means, the total processing will become 142 ms (65 * 1 ms + 35 * 2.2 ms). That is roughly 40% speed-up.

Last but not least, there are two things you need to consider before doing a profile-guided optimization. First, make sure that this is where the bottleneck is. Nothing is more disastrous than a premature optimization as it is not worth any engineer’s time to speed up something which does not matter much. Second, pay attention to the data set for the test runs as it must resemble the real-world data which will be faced by your application. Having a wrong profile will cause an incorrect optimization and may lead to a performance penalty. Also, keep in mind that the data set may evolve. After all, optimization is a journey and not a destination.

Like they used to say: profile responsibly.

  • Peter

    It’ll be faster if you convert the char to a number `d=c.charCodeAt(0)` and compare it by number: `return d >= 0 && d <= 9;` ;) #trollollollol

    Nah, but +1, always optimize by the profiler. So invaluable.

  • tomByrer

    Do you have any jsPerf tests to prove this optimization in ‘real-world’ tests please?

    IIRC Chrome & FIreFox (when it finds ‘hot’ code) does some branch-predicting optimizations. In theory your idea should speed things up well, but I’ve been surprised by a few hidden optimizations before:
    https://groups.google.com/forum/?hl=en#!topic/v8-users/XHyn9tWQ898

    Looking forward to more of your posts!