Tags:

blocks

Little did I know that the start of my adventure with Esprima three years ago will result in something beyond my expectation. While the syntax tree format used by Esprima is not original (see SpiderMonkey Parser API), this de-facto format gains a lot of traction since it provokes a Cambrian explosion of composable JavaScript language tooling, everything from a code coverage tool, a style checker, a delta debugger, a syntax autocompleter, a complexity visualizer, and many more. Mind you, this AST format is far from perfect and hence why some of us at Shape Security are taking a journey to figure out a better format.

Throughout the development, Esprima is also being used as a playground for a rigorious workflow. For example, performance is always important and hence why a benchmark system was implemented early on. There were numerous optimized JavaScript tricks (fixed object shape, profile-guided code shuffling, object-in-a-set) which I discovered via a few interesting investigations. Esprima also enforces a hard threshold of certain metrics, such as cyclomatic complexity and test coverage. Speaking of tests, I consider Esprima’s test suite (~ 800 unit tests) as its crown jewel. It is not uncommon to hear that this collection of tests is being utilized to assist the development of another similar parser, whether it is written in JavaScript or other languages.

After being in the wild for a while, Esprima started to attract more contributions, not only in term of adding new features but also for troubleshooting defects, solving performance challenges, and other less glamorous tasks. The growth, 600 dependent packages and 3 millions/month download on npmjs, needs to be anticipated as well. This was why after talking to Dave Methvin some time ago, I felt confident that jQuery Foundation would be a good new umbrella for the project. And that was how the adoption was initiated and finally completed a few weeks ago.

At the same time, JavaScript continues to evolve. The next edition, ECMAScript 6 (will be called ECMAScript 2015 officially) has its specification frozen, with some JavaScript engines (SpiderMonkey, V8, Chakra, JavaScriptCore) already start to support a few selected features. This has been anticipated by creating the special harmony branch in early 2012. In fact, it has served as the basis of a transpiler called (now defunct) Harmonizr, back when writing a transpiler was not considered cool yet. Meanwhile, more folks (particularly Facebook engineers and some others) continue to enhance this branch. It is being used to drive Facebook JavaScript infrastructure (see JSTransform, Recast, Regenerator, JSX), among others for its ES6 adoption. Still, this harmony branch (despite some unofficial third-party releases) is considered experimental and it should not be used in production.

This brings us to the most recent 2.0 release. Among others, this release starts to include carefully selected ES6 features (e.g. arrow function, default parameter, method definition). This is to facilitate the migration of downstream language tools, per the original plan outlined several months ago in the mailing-list:

The new master, which bears the version 2.x, will start to introduce ECMAScript 6 features. We will do it peacemeal, taking features which are known to be more or less stabilized in the most recent draft spec. In a few cases, this is a matter of bringing in the existing implementation from the experimental harmony branch.

Thanks to the wonderful community, these three years have been fantastic. Let’s continue to build amazing tools!

Tags:

Some time ago, I came up with a bar joke involving SMTP. Since I need to explain it a couple of times, I thought I just write it down as a blog post for future reference.

The joke goes like this (as a tweet):

The key thing here is the EHLO part. To explain this, let me show you a typical chatting between an SMTP server (e.g. from your mail provider) and an SMTP client (e.g. your email application). If you want to follow along, there is a nice trick. Sign up at Mailtrap for a test account (you can authenticate using your Github credential) and you will have a test server to play with.

Start by connecting to the server using telnet:

telnet mailtrap.io 2525
Trying 54.85.222.127...
Connected to mailtrap.io.
Escape character is '^]'.
220 mailtrap.io ESMTP ready

ehloAt this moment, you are supposed to greet the server (see RFC 821, Section 3.5 on Opening and Closing) using the HELO command:

HELO mailtrap.io
250 mailtrap.io

If you carefully read the above RFC 821, it is obvious that SMTP commands are 4-letter words. Thus, MAIL is for initiating a transaction, NOOP is to do nothing, HELP for showing up some instructions, and so on.

As SMTP grows in functionality, an extension mechanism is established so that the client recognizes certain extra features of the server and perhaps would like to leverage them. Rather than inventing a completely different opening command, EHLO is introduced (see RFC 5321, Section 3.2 on Client Initialization). This new command let the client and server know about each other’s privileged status. For example, running EHLO on Mailtrap gives us:

EHLO mailtrap.io
250-mailtrap.io
250-SIZE 5242880
250-PIPELINING
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-DSN
250-AUTH PLAIN LOGIN CRAM-MD5
250 STARTTLS

which basically lists some service extensions supported by Mailtrap’s SMTP server.

Practically all modern email clients prefer to use EHLO instead. It is quite widespread and hence, the bar joke and the EHLO style of greeting. That was fun, right?

QUIT

Tags:

clonetroopersIn some cases, an instance of a C++ class should not be copied at all. There are three ways to prevent such an object copy: keeping the copy constructor and assignment operator private, using a special non-copyable mixin, or deleting those special member functions.

A class that represents a wrapper stream of a file should not have its instance copied around. It will cause a confusion in the handling of the actual I/O system. In a similar spirit, if an instance holds a unique private object, copying the pointer does not make sense. A somehow related problem but not necessarily similar is the issue of object slicing.

The following illustration demonstrates a simple class Vehicle that is supposed to have a unique owner, an instance of Person.

class Car {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  void info() const;
private:
  Person *owner;
};

For this purpose, the implementation of Person is as simple as:

struct Person {
  std::string name;
};

To show the issue, a helper function info() is implement as follows:

void Car::info() const
{
  if (owner) {
    std::cout < < "Owner is " << owner->name < < std::endl;
  } else {
    std::cout << "This car has no owner." << std::endl;
}

From this example, it is obvious that an instance of Car must not be copied. In particular, another clone of a similar car should not automatically belong to the same owner. In fact, running the subsequent code:

  Person joe;
  joe.name = "Joe Sixpack";
 
  Car sedan;
  sedan.setOwner(&joe);
  sedan.info();
  Car anotherSedan = sedan;
  anotherSedan.info();

will give the output:

Owner is Joe Sixpack
Owner is Joe Sixpack

How can we prevent this accidental object copy?

Method 1: Private copy constructor and copy assignment operator

A very common technique is to declare both the copy constructor and copy assignment operator to be private. We do not even need to implement them. The idea is so that any attempt to perform a copy or an assignment will provoke a compile error.

In the above example, Car will be modified to look like the following. Take a look closely at two additional private members of the class.

class Car {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  void info() const;
private:
  Car(const Car&);
  Car& operator=(const Car&);
  Person *owner;
};

Now if we try again to assign an instance of Car to a new one, the compiler will complain loudly:

example.cpp:35:22: error: calling a private constructor of class 'Car'
  Car anotherSedan = sedan;
                     ^
example.cpp:22:3: note: declared private here
  Car(const Car&);
  ^
1 error generated.

If writing two additional lines containing repetitive names is too cumbersome, a macro could be utilized instead. This is the approach used by WebKit, see its WTF_MAKE_NONCOPYABLE macro from wtf/Noncopyable.h (do not be alarmed, in the context of WebKit source code, WTF here stands for Web Template Framework). Chromium code, as shown in the file base/macros.h, distinguishes between copy constructor and assignment, denoted as DISALLOW_COPY and DISALLOW_ASSIGN macros, respectively.

Method 2: Non-copyable mixin

The idea above can be extended to create a dedicated class which has the sole purpose to prevent object copying. It is often called as Noncopyable and typically used as a mixin. In our example, the Car class can then be derived from this Noncopyable.

Boost users may be already familiar with boost::noncopyable, the Boost flavor of the said mixin. A conceptual, self-contained implementation of that mixin will resemble something like the following:

class NonCopyable
{
  protected:
    NonCopyable() {}
    ~NonCopyable() {}
  private: 
    NonCopyable(const NonCopyable &);
    NonCopyable& operator=(const NonCopyable &);
};

Our lovely Car class can be written as:

class Car: private NonCopyable {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  }
private:
  Person *owner;
};

Compared to the first method, using Noncopyable has the benefit of making the intention very clear. A quick glance at the class, right on its first line, and you know right away that its instance is not supposed to be copied.

Method 3: Deleted copy constructor and copy assignment operator

For modern applications, there is less and less reason to get stuck with the above workaround. Thanks to C++11, the solution becomes magically simple: just delete the copy constructor and assignment operator. Our class will look like this instead:

class Car {
public:
  Car(const Car&) = delete;
  void operator=(const Car&) = delete;
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
private:
  Person *owner;
};

Note that if you use boost::noncopyable mixin with a compiler supporting C++11, the implementation of boost::noncopyable also automatically deletes the said member functions.

With this approach, any accidental copy will result in a quite friendlier error message:

example.cpp:34:7: error: call to deleted constructor of 'Car'
  Car anotherSedan = sedan;
      ^              ~~~~~
example.cpp:10:3: note: 'Car' has been explicitly marked deleted here
  Car(const Car&) = delete;
  ^

So, which of the above three methods is your favorite?

Tags:

While a build system is always critical to the success of a software project, maintaining such a system is not always fun. Hence, we tend to investigate many different ways to reduce the maintenance effort. Thanks to Docker, there is a possibility to have the build agent itself very simple because it does nothing but to spin and run a Docker container.

phoenixImagine if you are a Python shop and suddenly you have an engineer trying to experiment with Go for the new REST API server. It is certainly possible to retrofit your build infrastructure to include Go development tools and dependencies. But what if another environment and other frameworks are also needed? It is not scalable (process-wise) to always bug your build/release engineers and bug them with these (continuous) requirements.

In a configuration that involves a server-agent setup (or in Jenkins lingo, master-slave), the agent is the one that does the backbreaking work. In the previous blog post, Build Agent: Template vs Provisioning, I already outlined the most common techniques to eliminate the need to babysit a build agent. I am myself is a big fan of the automatic provisioning approach. Like what Martin Fowler wrote about Phoenix Server:

A server should be like a phoenix, regularly rising from the ashes.

When a build agent misbehaves due to a configuration drift, we shall not bother to troubleshoot it. We simply terminate that troublesome phoenix and let it regenerate (thanks to the provisioning mechanism). For another rather philosophical aspect on this approach, read also my other blog post on A Maturity Model for Build Automation.

The Container is the Phoenix

If many of your build agents share the same trait, e.g. they are mostly a Linux system (often in its virtualized form, e.g. an EC2 instance) with assorted different tools (compilers, libraries, frameworks, test systems), then the scenario can be further simplified. What if the build agent is not the actual Phoenix? What if the build agent is only the realm where the phoenix lives (and dies)?

In this situation, a Docker container becomes the real phoenix. Every project will need to supply some additional information (imperative: in the form of a script, declarative: common configuration understood by the build tool) necessary for the build agent: which container to be used and how to initiate that in-container build.

Let’s take a simple project and setup a build using this Docker and Phoenix approach. For this example, we will build a CPU feature detection tool (implemented using C++). If you want to follow along, simply clone its git repository bitbucket.org/ariya/cpu-detect and pay attention to the phoenix subdirectory.

There are two shell scripts inside the phoenix subdirectory: init.sh and build.sh.

The first one, init.sh, is the one to be executed by the build agent. It pulls the container used to execute the actual build step. Since this is a C++ project, we will leverage the gcc container. After that, it runs the container with a volume mapping so that /source inside the container is mapped to the git checkout directory. When the container is launched, it also executes the other script build.sh (referred as /source/phoenix/build.sh since we are now inside the container).

If we simplify it, the whole content of init.sh can be summarized as:

docker run -v $SOURCE_PATH:/source gcc:4.9 sh - c "/source/phoenix/build.sh"

The second script, build.sh, is not executed by the build agent directly. It will run inside the specified container, as described above. The main part of build.sh is to run the actual build step. For this project, it only needs to invoke make (in a real-world project, a battery of tests must be part of this). Before that, the script needs to prepare a build directory and copy the original source (remember, /source inside the container corresponds to the git checkout). Once the build is completed, the build artifact has to be transferred back. In this case, we just copy the generated cpu-detect executable.

If any step during this process fails, including make itself, then the whole process will be marked as a failure. This automatic propagation of status eliminates the need for a custom error handling.

To test this setup, have a box with Docker ready to use and then launch phoenix/init.sh. If everything works correctly, you will see an output like the following screenshot.

incontainer

If you experience some Inception moment trying to follow the steps, please use the following diagram. It is also a useful exercise to adopt those two phoenix scripts to your own personal project.

diagram

Agent of Democracy

In the above example, we pull and run a ready-to-use gcc container. In practice, you may want to come up with a set of customized containers to suit your need. Hence, it is highly recommended that you setup your own Docker registry to be used internally. This becomes a private registry and it should not be accessible by anyone outside your organization. Here is how your init.sh might look like incorporating the technique:

REGISTRY="docker.mycompany.com"
IMAGE="golang"
TAG="1.4"
CONTAINER="${REGISTRY}/${IMAGE}:${TAG}"
 
echo "Container to be used: $CONTAINER."
docker pull $CONTAINER
echo

Now that the build process only happens inside the container, you can trim down the build agent. For example, it does not need to have packages for all development environment, from Perl to Haskell. All it needs is Docker (and of course the client software to run as a build agent) and thereby massively reducing the provisioning and maintenance effort.

Let’s go back to the illustrative use case mentioned earlier. If an engineer in your team is inspired to evaluate Go, you do not need to modify your build infrastructure. Just ask them to provide a suitable Go development container (or reuse an existing once such as google/golang) and prepare that phoenix-like bootstrapper scripts. The same goes for the new intern who prefers to tinker with Rust instead. No change in the build agent is necessary! Everyone, regardless the project requirements, can utilize the same infrastructure.

In fact, if you think through this carefully, you will realize that all those Linux build agents are not unique at all. They all have the same installed packages and no agent is better or worse than the others. There is no second-class citizen. This is democracy at its best.

Parametrization and Resilience

Knowing the build number and other related build information is often essential to the build process. Fortunately, many continuous integration systems (Bamboo, TeamCity, Jenkins, etc) can pass that information via environment variables. This is quite powerful since all we need to do is to continue to pass that to Docker. For example, if you use Bamboo, then the invocation of docker needs to be modified to look like (notice the use of -e option to denote an environment variable).

docker run -v $SOURCE_PATH:/source \
  -e bamboo_buildNumber=${bamboo_buildNumber}\
  $CONTAINER sh - c "/source/phoenix/build.sh"

medikitAnother side effect of this Docker-based build is the built-in error recovery. In many cases, a build may fail or it gets stuck in some process. Ideally, you want to terminate the build in this situation since it warrants a more thorough investigation. Armed with the useful Unix timeout command, we just need to modify our Docker invocation:

TIMEOUT=2m
echo "Triggering the build (with ${TIMEOUT} timeout)..."
timeout --signal=SIGKILL ${TIMEOUT} \
  docker run -v $SOURCE_PATH:/source \
  $CONTAINER sh - c "/source/phoenix/build.sh"

By the way, this is the reason why there is an explicit docker pull in init.sh. Technically it’s not needed, but we use it a mechanism to warm up the container cache. This way, the time it takes to initially pull the container will not be included in that 2-minute timeout.

With the use of timeout, if the Docker process would not complete in 2 minutes, it will be terminated with SIGKILL, effectively aborting the whole step at once. Since the offending application is isolated inside a container, this kind of clean-up also results in a really clean termination. There is no more server hanging out doing nothing because it was not killed properly. There is no stray zombie process eating the resources in the background.

Summary: Use Docker to modify the build agent to be a realm where your phoenix lives and dies. After that, turn every build process into a short-lived phoenix.

Tags:

The most recent Shellshock, a vulnerability in the popular shell bash, got me to evaluate again the unique setup on Ubuntu/Debian. In this setup, script execution is not handled by bash, this job is carred out by dash, the Debian Almquist Shell. Meanwhile, bash is still used for the interactive shell since dash does not have autocomplete and history support.

One advantage of this setup is that you start to write your script taking into account that it will not be executed by bash only. This makes sense, it requires just a little effort to avoid certain bashisms and stay compatible to the POSIX syntax. I myself was pretty ignorant of this, assuming that bash is ubiquitous. After using dash for a while, I learn a couple of new tricks and I am happier than my scripts follow this standard best practice. While dash is more efficient in its execution, in most cases the difference is negligible and that was not my primary concern.

Linux users can get dash easily, it is one apt-get or yum install away. For OS X, it is easy enough to build it from its source, e.g.:

DASH_VERSION=0.5.7
DASH_FULLNAME=dash-${DASH_VERSION}
DASH_TARBALL=${DASH_FULLNAME}.tar.gz
DASH_DOWNLOAD=http://gondor.apana.org.au/~herbert/dash/files/${DASH_TARBALL}
rm -rf ${DASH_TARBALL} ${DASH_FULLNAME}
curl -L ${DASH_DOWNLOAD} -o ${DASH_TARBALL}
tar -xzf ${DASH_TARBALL}
cd ${DASH_FULLNAME}
./configure && make
sudo make install
fish

As for the interactive shell, my favorite these days is fish. It does not support every single feature of bash but it works very well. If you are desperate, you can still workaround what you miss from bash. And make sure you check out Oh My Fish! as well.

Whether you prefer bash or dash or fish, Unix shells are always fun to explore!

Tags:

Last week I was in Chicago for the most recent jQuery Conference, part of my autumn tour. It was a fantastic opportunity to have some face-to-face conversations as well as to get to know different folks in the jQuery community. Most importantly, I feel the urge to recall the revolution of the web, in which the state Illinois played an extremely important role.

The talk I must deliver for jQuery Conference is on a 10,000-foot overview of JavaScript execution in a web browser (check the slide deck, expect the video in a few weeks. Update: watch the video). Since the browser is an essential topic in this case, I could not resist to interject a story, a short detour on the browser history. You might have read a different variant of this, when I wrote about the back-story of various names of popular browsers out there. In particular, pay attention to the exploratory theme (in bold) in the following diagram.

revo

The said revolution was started by Mosaic, well-known as the first popular web browser which dramatically boosted the popularity of world wide web. Mosaic was developed at NCSA, part of University of Illinois. Some Mosaic team members later went to create Netscape and develop Netscape Navigator. One of them is Marc Andreessen (@pmarca), a graduate of University of Illinois, who is known these days as a prominent investor at Andreessen Horowitz (a16z) VC firm.

Mosaic was later born again in the form of Internet Explorer. Spyglass, an official licensee of NCSA Mosaic project, licensed their browser technology (unrelated to the original Mosaic code) to Microsoft and it became the genesis of a new browser contender to Netscape Navigator. The rise of Internet Explorer also led to the first Browser War.

During this time, Netscape was in trouble and the browser code was released as an open-source project Mozilla. After a few years, Firefox emerged and this time became the browser that challenged Internet Explorer. One of the co-creator of Firefox is Dave Hyatt, he was involved with Netscape and it is hardly a surprise that he also finished his graduate from University of Illinois. Later on, Dave Hyatt was one of the influential figures behind WebKit, the rendering engine that powers Safari.

Something that is not mentioned in the above browser diagram is of course JavaScript. Are you going to be surprised if I now mention that Brendan Eich (@BrendanEich), the father of JavaScript, finished his master also at the University of Illinois?

Beside being close to the seed of such a revolution, I did really enjoy my first trip to Chicago. It seems that Chicago easily makes it into the list of future travel I want to plan. Meanwhile, my next stop will be San Francisco for the upcoming HTML5 Developer Conference.