.

Tags:

Every now and then I got someone asking me why I did bother to create Esprima at all. There exists already several great JavaScript parsers written in JavaScript. Most likely, it is easier to use the existing parser than writing a new one. This post summarizes my reasoning behind the birth of Esprima (in no particular order). These are some distinctive features of a parser I’d like to have but there was no parser fulfilling all the requirements at the time and hence why I embarked on a journey to create that parser. Necessity is the mother of invention.

Of course, Esprima is by no means perfect, nor complete. If you have suggestions on how to improve it, join the mailing-list and let’s work together. Esprima already provides the basis for for various tools such as source modification (Esmorph), coverage analyzer (node-cover and coveraje), source-to-source compiler (Harmonizr), syntax formatter (Code Painter), and code generator (escodegen). The exciting Mozilla project LLJS (Low-Level JavaScript) is using the modified version of Esprima parsing routines. Eclipse Orion, the new web-focused development tool, also uses Esprima as the back-end for its smart autocompletion logic.

Let’s bring the state of JavaScript tooling to the next level.

Blazing fast

For an obvious reason, a faster parser is always better, provided that it does not sacrifice the correctness. Esprima is designed from the ground-up to be fast. In fact, the corpus for the benchmarks suite was among the first thing I planned from day one. As long as the code readability is not degraded, some selected optimizations (such as tuned branching, switch case deoptimization, and object structure) were carefully applied. Even its run-time scalability is monitored closely. Speed matters!

The above bar chart (short is better) shows how Esprima compares against the parser from the well-known UglifyJS project. The test machine is a dated Toshiba laptop from 2009 (preinstalled with Windows 7 Home Premium) running the online speed comparison tests.

Sensible AST Format

For the abstract syntax tree (AST), I decided to settle for the same format used in Mozilla SpiderMonkey parser reflection. The reason is simple, the format is closely following the actual ECMAScript language specification. This is also manifested in the source code of Esprima, I stick with the standardized terminologies used in the specification. Having the source code, the specification, and the parser output all speak the same language helps a lot. Even the syntax tree visualization will look familiar.

Heavily unit tested, with full code coverage

Esprima has more than 500 unit tests, and it keeps growing. While the parser needs to handle unpredictable source the user is throwing at, comprehensive sanity check is still compulsory. For reasons already described elsewhere, every bug fix and feature must be logged in the issue tracker (zero tolerance!). I try to reject a pull request without a filed issue or a test case, somehow that helps (often, indirectly) to harden the parser and to ensure that it remains stable.

Code coverage is just the logical next step after the above. Code that is never executed gives a false sense of accomplishment, it is an accident waiting to happen. Esprima has a full code coverage (to the best the coverage tool can prove it). In fact, reducing code coverage is considered a fatal issue that the contribution guide specifically forbids it: No coverage regression. Again, zero tolerance.

Writing a parser is hard. Even with tons of unit tests, there is no guarantee that it will be bug-free. However, those tests help us sleeping better at night, both these days and in the near future.

Don’t give up easily

In a JavaScript engine, the parser is not really forgiving. If the code does not follow the syntax, there is no use in going further because that code can’t be sensibly executed anyway. The parser is therefore built to follow the language specification faithfully, it will bail out when it detects a fatal syntax error.

On the other hand, a parser which can deal with incomplete or broken code (to a certain threshold) would be particularly useful for various uses cases, among others static analysis and code autocompletion. In fact, this is how smart autocompletion in Eclipse Orion, also known as Content Assist, gets implemented. The completion logic needs to handle invalid syntax because the user can be still in the middle of typing. Being tolerant to such errors (to certain extent), it would have a good semantics overview of the broken code on a best-effort basis.

Error tolerant parsing is tricky, it’s still an ongoing work. Expect to see continuous improvements as we try various recovery strategies and refine the implementation.

Forward looking: strict mode, ES.Next, Harmony

Strict mode, ES.Next and Harmony are not second-class citizens in Esprima land. There is no use living in the past. With upcoming language features showing up in modern browsers, the developers will start using it. This is indeed essentials, if the tools (syntax checker, coverage, parser, compressor) do not understand modern language constructs, then developers won’t use it. For example, it would drive you crazy if the editor warns you Code has no effect for that "use strict" (hello NetBeans!).

Whether it’s about strict mode, lexical block scope via let or using Harmony module declaration, Esprima aims to support it. Some of the fun bleeding-edge stuff happens in the special harmony branch. You can see support for ES.Next features such as module, destructuring, class, for-of statement, and many others.

Because Esprima facilitates non-destructive partial modification (only specific portions of your code are touched, everything else including the comments and indentations are left intact), it can be used for instrumentation purposes such function prolog injection or application startup tracking. Combined with the latest ES.Next goodies, this can lead to an interesting transpiler which permits you to use future JavaScript syntax and run it with today’s browser. As demonstrated by the Harmonizr project, it’s totally possible to write something like:

module LinearAlgebra {
    export const CoordinateSystem = 'Cartesian';
 
    // Create 2-D point.
    export function Point(x, y) {
        return { x, y };
    }
}

and gets transpiled into (note how the formatting and comment are not destroyed):

var LinearAlgebra = function() {
    const CoordinateSystem = 'Cartesian';
 
    // Create 2-D point.
    function Point(x, y) {
        return { x: x, y: y };
    }
 
    return {
        CoordinateSystem: CoordinateSystem,
        Point: Point
    };
}();

Obviously, it is also entirely possible to target AMD or CommonJS module syntax as well.

Shall we not have the taste of the bright ES.Next future?

  • MySchizoBuddy

    Since LLJS is very close to JS they can use a modified version of esprima instead of writing their own parser. correct?

    modified esprima cannot be used with say coffeescript because it’s way too different from normal js

    • http://ariya.ofilabs.com/ Ariya Hidayat

      That’s correct. A language parser can only be reused (with modifications) for another fairly similar language.

  • MySchizoBuddy

    All tutorials I see only show you how to do parse a basic calculator. No one does anything complicated like a complete class definition.

    Is there a book you can recommend that will go in lot more detail and is easy to read with examples.

    • http://ariya.ofilabs.com/ Ariya Hidayat

      The “Dragon Book” is usually the most referred book in that subject.