Tags:

There are various hosted continuous integration services out there that you can use for your Node.js projects, from Travis CI to drone.io and many others. If you feel adventurous or you are always fascinated by a DIY solution (for whatever reasons), it is apparently quite easy to setup your own CI system quickly using Docker and TeamCity.

logo_teamcityAs an easy-to-use continuous integration system, TeamCity offers two free solutions for you: Professional Server license for up to 20 build configurations or Open Source license for your open-source projects. This is usually sufficient to get you started. Also, per the usual server agent architecture, we will run TeamCity server and agent in two separate containers. This is very similar to my previous blog post on TeamCity installation using Docker, with a minor tweak.

First, you need a machine for the server. This could be a physical machine, a virtual machine, or even a VPS. For a hassle-free setup, sign up for either Vultr or Digital Ocean (note: my affiliate links). Make sure you evaluate the system requirements to run the server (e.g. 2 cores and 2 GB RAM will be ideal).

On this machine, Docker must be installed properly. A useful quick test:

sudo docker run -it ariya/centos7-oracle-jre7 cat /etc/redhat-release

should show something like:

CentOS Linux release 7.0.1406 (Core)

Once Docker is there, starting TeamCity server is as easy as:

sudo docker run -dt --name teamcity_server -p 8111:8111 \
  ariya/centos7-teamcity-server

This is using a prepared container I have created called ariya/centos7-teamcity-server. Note that the container supports volume mapping of /data/teamcity. You definitely need to do this if you want to persist your TeamCity projects and other settings. Here is a fancier way to invoke the server where the data is stored on the host system under /var/data/teamcity and with automatic restart in case the server dies.

sudo docker run -dt --name teamcity_server --restart=always -p 8111:8111
  -v /var/data/teamcity:/data/teamcity
  ariya/centos7-teamcity-server

Also, if you are using a firewall, make sure to accept connections on port 8111. With iptables:

sudo iptables -A INPUT -p tcp --dport 8111 -j ACCEPT
sudo service iptables save

Once the server is running, visit the site (on port 8111) using your web browser. This allows you to initialize and configure TeamCity server. In a minute or two, it should be ready to use.

teamcity_starting

You can start creating your CI project, refer to the excellent TeamCity documentation for details. For the build process itself, it is quite common to invoke npm twice, first to install the dependencies and then to run the tests. This is illustrated in the following screenshot.

teamcity_project

While it is sufficient to use the command-line runner to invoke e.g. npm test, if you want to be a bit more sophisticated, you can use a customized runner such TeamCity.Node.

Of course, the project can not be executed right now because the server does not have any connecting build agents yet. Starting an agent is also extremely straightforward as I already prepared another container for that, ariya/centos7-teamcity-agent-nodejs. This container is already equipped with Node.js 0.10 and npm 1.3.

sudo docker run -e TEAMCITY_SERVER=http://$TEAMCITY_HOST:8111 -dt -p 9090:9090 \
  ariya/centos7-teamcity-agent-nodejs

In the above example, you need to supply the IP address of your server with the environment variable TEAMCITY_HOST. Again, the firewall needs to accept connections on port 9090.

teamcity_agent

It is of course possible to run this agent on the same host as the server, particularly if you have a beefy machine. In this case, you need to use Docker IP address:

export TEAMCITY_HOST=$(sudo docker inspect --format \
  '{{ .NetworkSettings.IPAddress }}' teamcity_server)

It takes a while for the agent to register itself with the server. However, it does not mean that the agent is immediately available. First, you need to authorize it so that the server will trust the agent and start dispatching the build tasks to the said agent. After that, you can start running your project.

teamcity_buildlog

Thanks to Docker, everything could be done in 10 minutes or less. Have fun with all the tests!

Tags:

blocks

Little did I know that the start of my adventure with Esprima three years ago will result in something beyond my expectation. While the syntax tree format used by Esprima is not original (see SpiderMonkey Parser API), this de-facto format gains a lot of traction since it provokes a Cambrian explosion of composable JavaScript language tooling, everything from a code coverage tool, a style checker, a delta debugger, a syntax autocompleter, a complexity visualizer, and many more. Mind you, this AST format is far from perfect and hence why some of us at Shape Security are taking a journey to figure out a better format.

Throughout the development, Esprima is also being used as a playground for a rigorious workflow. For example, performance is always important and hence why a benchmark system was implemented early on. There were numerous optimized JavaScript tricks (fixed object shape, profile-guided code shuffling, object-in-a-set) which I discovered via a few interesting investigations. Esprima also enforces a hard threshold of certain metrics, such as cyclomatic complexity and test coverage. Speaking of tests, I consider Esprima’s test suite (~ 800 unit tests) as its crown jewel. It is not uncommon to hear that this collection of tests is being utilized to assist the development of another similar parser, whether it is written in JavaScript or other languages.

After being in the wild for a while, Esprima started to attract more contributions, not only in term of adding new features but also for troubleshooting defects, solving performance challenges, and other less glamorous tasks. The growth, 600 dependent packages and 3 millions/month download on npmjs, needs to be anticipated as well. This was why after talking to Dave Methvin some time ago, I felt confident that jQuery Foundation would be a good new umbrella for the project. And that was how the adoption was initiated and finally completed a few weeks ago.

At the same time, JavaScript continues to evolve. The next edition, ECMAScript 6 (will be called ECMAScript 2015 officially) has its specification frozen, with some JavaScript engines (SpiderMonkey, V8, Chakra, JavaScriptCore) already start to support a few selected features. This has been anticipated by creating the special harmony branch in early 2012. In fact, it has served as the basis of a transpiler called (now defunct) Harmonizr, back when writing a transpiler was not considered cool yet. Meanwhile, more folks (particularly Facebook engineers and some others) continue to enhance this branch. It is being used to drive Facebook JavaScript infrastructure (see JSTransform, Recast, Regenerator, JSX), among others for its ES6 adoption. Still, this harmony branch (despite some unofficial third-party releases) is considered experimental and it should not be used in production.

This brings us to the most recent 2.0 release. Among others, this release starts to include carefully selected ES6 features (e.g. arrow function, default parameter, method definition). This is to facilitate the migration of downstream language tools, per the original plan outlined several months ago in the mailing-list:

The new master, which bears the version 2.x, will start to introduce ECMAScript 6 features. We will do it peacemeal, taking features which are known to be more or less stabilized in the most recent draft spec. In a few cases, this is a matter of bringing in the existing implementation from the experimental harmony branch.

Thanks to the wonderful community, these three years have been fantastic. Let’s continue to build amazing tools!

Tags:

Some time ago, I came up with a bar joke involving SMTP. Since I need to explain it a couple of times, I thought I just write it down as a blog post for future reference.

The joke goes like this (as a tweet):

The key thing here is the EHLO part. To explain this, let me show you a typical chatting between an SMTP server (e.g. from your mail provider) and an SMTP client (e.g. your email application). If you want to follow along, there is a nice trick. Sign up at Mailtrap for a test account (you can authenticate using your Github credential) and you will have a test server to play with.

Start by connecting to the server using telnet:

telnet mailtrap.io 2525
Trying 54.85.222.127...
Connected to mailtrap.io.
Escape character is '^]'.
220 mailtrap.io ESMTP ready

ehloAt this moment, you are supposed to greet the server (see RFC 821, Section 3.5 on Opening and Closing) using the HELO command:

HELO mailtrap.io
250 mailtrap.io

If you carefully read the above RFC 821, it is obvious that SMTP commands are 4-letter words. Thus, MAIL is for initiating a transaction, NOOP is to do nothing, HELP for showing up some instructions, and so on.

As SMTP grows in functionality, an extension mechanism is established so that the client recognizes certain extra features of the server and perhaps would like to leverage them. Rather than inventing a completely different opening command, EHLO is introduced (see RFC 5321, Section 3.2 on Client Initialization). This new command let the client and server know about each other’s privileged status. For example, running EHLO on Mailtrap gives us:

EHLO mailtrap.io
250-mailtrap.io
250-SIZE 5242880
250-PIPELINING
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-DSN
250-AUTH PLAIN LOGIN CRAM-MD5
250 STARTTLS

which basically lists some service extensions supported by Mailtrap’s SMTP server.

Practically all modern email clients prefer to use EHLO instead. It is quite widespread and hence, the bar joke and the EHLO style of greeting. That was fun, right?

QUIT

Tags:

clonetroopersIn some cases, an instance of a C++ class should not be copied at all. There are three ways to prevent such an object copy: keeping the copy constructor and assignment operator private, using a special non-copyable mixin, or deleting those special member functions.

A class that represents a wrapper stream of a file should not have its instance copied around. It will cause a confusion in the handling of the actual I/O system. In a similar spirit, if an instance holds a unique private object, copying the pointer does not make sense. A somehow related problem but not necessarily similar is the issue of object slicing.

The following illustration demonstrates a simple class Vehicle that is supposed to have a unique owner, an instance of Person.

class Car {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  void info() const;
private:
  Person *owner;
};

For this purpose, the implementation of Person is as simple as:

struct Person {
  std::string name;
};

To show the issue, a helper function info() is implement as follows:

void Car::info() const
{
  if (owner) {
    std::cout < < "Owner is " << owner->name < < std::endl;
  } else {
    std::cout << "This car has no owner." << std::endl;
}

From this example, it is obvious that an instance of Car must not be copied. In particular, another clone of a similar car should not automatically belong to the same owner. In fact, running the subsequent code:

  Person joe;
  joe.name = "Joe Sixpack";
 
  Car sedan;
  sedan.setOwner(&joe);
  sedan.info();
  Car anotherSedan = sedan;
  anotherSedan.info();

will give the output:

Owner is Joe Sixpack
Owner is Joe Sixpack

How can we prevent this accidental object copy?

Method 1: Private copy constructor and copy assignment operator

A very common technique is to declare both the copy constructor and copy assignment operator to be private. We do not even need to implement them. The idea is so that any attempt to perform a copy or an assignment will provoke a compile error.

In the above example, Car will be modified to look like the following. Take a look closely at two additional private members of the class.

class Car {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  void info() const;
private:
  Car(const Car&);
  Car& operator=(const Car&);
  Person *owner;
};

Now if we try again to assign an instance of Car to a new one, the compiler will complain loudly:

example.cpp:35:22: error: calling a private constructor of class 'Car'
  Car anotherSedan = sedan;
                     ^
example.cpp:22:3: note: declared private here
  Car(const Car&);
  ^
1 error generated.

If writing two additional lines containing repetitive names is too cumbersome, a macro could be utilized instead. This is the approach used by WebKit, see its WTF_MAKE_NONCOPYABLE macro from wtf/Noncopyable.h (do not be alarmed, in the context of WebKit source code, WTF here stands for Web Template Framework). Chromium code, as shown in the file base/macros.h, distinguishes between copy constructor and assignment, denoted as DISALLOW_COPY and DISALLOW_ASSIGN macros, respectively.

Method 2: Non-copyable mixin

The idea above can be extended to create a dedicated class which has the sole purpose to prevent object copying. It is often called as Noncopyable and typically used as a mixin. In our example, the Car class can then be derived from this Noncopyable.

Boost users may be already familiar with boost::noncopyable, the Boost flavor of the said mixin. A conceptual, self-contained implementation of that mixin will resemble something like the following:

class NonCopyable
{
  protected:
    NonCopyable() {}
    ~NonCopyable() {}
  private: 
    NonCopyable(const NonCopyable &);
    NonCopyable& operator=(const NonCopyable &);
};

Our lovely Car class can be written as:

class Car: private NonCopyable {
public:
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
  }
private:
  Person *owner;
};

Compared to the first method, using Noncopyable has the benefit of making the intention very clear. A quick glance at the class, right on its first line, and you know right away that its instance is not supposed to be copied.

Method 3: Deleted copy constructor and copy assignment operator

For modern applications, there is less and less reason to get stuck with the above workaround. Thanks to C++11, the solution becomes magically simple: just delete the copy constructor and assignment operator. Our class will look like this instead:

class Car {
public:
  Car(const Car&) = delete;
  void operator=(const Car&) = delete;
  Car(): owner(0) {}
  void setOwner(Person *o) { owner = o; }
  Person *getOwner() const { return owner; }
private:
  Person *owner;
};

Note that if you use boost::noncopyable mixin with a compiler supporting C++11, the implementation of boost::noncopyable also automatically deletes the said member functions.

With this approach, any accidental copy will result in a quite friendlier error message:

example.cpp:34:7: error: call to deleted constructor of 'Car'
  Car anotherSedan = sedan;
      ^              ~~~~~
example.cpp:10:3: note: 'Car' has been explicitly marked deleted here
  Car(const Car&) = delete;
  ^

So, which of the above three methods is your favorite?

Tags:

While a build system is always critical to the success of a software project, maintaining such a system is not always fun. Hence, we tend to investigate many different ways to reduce the maintenance effort. Thanks to Docker, there is a possibility to have the build agent itself very simple because it does nothing but to spin and run a Docker container.

phoenixImagine if you are a Python shop and suddenly you have an engineer trying to experiment with Go for the new REST API server. It is certainly possible to retrofit your build infrastructure to include Go development tools and dependencies. But what if another environment and other frameworks are also needed? It is not scalable (process-wise) to always bug your build/release engineers and bug them with these (continuous) requirements.

In a configuration that involves a server-agent setup (or in Jenkins lingo, master-slave), the agent is the one that does the backbreaking work. In the previous blog post, Build Agent: Template vs Provisioning, I already outlined the most common techniques to eliminate the need to babysit a build agent. I am myself is a big fan of the automatic provisioning approach. Like what Martin Fowler wrote about Phoenix Server:

A server should be like a phoenix, regularly rising from the ashes.

When a build agent misbehaves due to a configuration drift, we shall not bother to troubleshoot it. We simply terminate that troublesome phoenix and let it regenerate (thanks to the provisioning mechanism). For another rather philosophical aspect on this approach, read also my other blog post on A Maturity Model for Build Automation.

The Container is the Phoenix

If many of your build agents share the same trait, e.g. they are mostly a Linux system (often in its virtualized form, e.g. an EC2 instance) with assorted different tools (compilers, libraries, frameworks, test systems), then the scenario can be further simplified. What if the build agent is not the actual Phoenix? What if the build agent is only the realm where the phoenix lives (and dies)?

In this situation, a Docker container becomes the real phoenix. Every project will need to supply some additional information (imperative: in the form of a script, declarative: common configuration understood by the build tool) necessary for the build agent: which container to be used and how to initiate that in-container build.

Let’s take a simple project and setup a build using this Docker and Phoenix approach. For this example, we will build a CPU feature detection tool (implemented using C++). If you want to follow along, simply clone its git repository bitbucket.org/ariya/cpu-detect and pay attention to the phoenix subdirectory.

There are two shell scripts inside the phoenix subdirectory: init.sh and build.sh.

The first one, init.sh, is the one to be executed by the build agent. It pulls the container used to execute the actual build step. Since this is a C++ project, we will leverage the gcc container. After that, it runs the container with a volume mapping so that /source inside the container is mapped to the git checkout directory. When the container is launched, it also executes the other script build.sh (referred as /source/phoenix/build.sh since we are now inside the container).

If we simplify it, the whole content of init.sh can be summarized as:

docker run -v $SOURCE_PATH:/source gcc:4.9 sh - c "/source/phoenix/build.sh"

The second script, build.sh, is not executed by the build agent directly. It will run inside the specified container, as described above. The main part of build.sh is to run the actual build step. For this project, it only needs to invoke make (in a real-world project, a battery of tests must be part of this). Before that, the script needs to prepare a build directory and copy the original source (remember, /source inside the container corresponds to the git checkout). Once the build is completed, the build artifact has to be transferred back. In this case, we just copy the generated cpu-detect executable.

If any step during this process fails, including make itself, then the whole process will be marked as a failure. This automatic propagation of status eliminates the need for a custom error handling.

To test this setup, have a box with Docker ready to use and then launch phoenix/init.sh. If everything works correctly, you will see an output like the following screenshot.

incontainer

If you experience some Inception moment trying to follow the steps, please use the following diagram. It is also a useful exercise to adopt those two phoenix scripts to your own personal project.

diagram

Agent of Democracy

In the above example, we pull and run a ready-to-use gcc container. In practice, you may want to come up with a set of customized containers to suit your need. Hence, it is highly recommended that you setup your own Docker registry to be used internally. This becomes a private registry and it should not be accessible by anyone outside your organization. Here is how your init.sh might look like incorporating the technique:

REGISTRY="docker.mycompany.com"
IMAGE="golang"
TAG="1.4"
CONTAINER="${REGISTRY}/${IMAGE}:${TAG}"
 
echo "Container to be used: $CONTAINER."
docker pull $CONTAINER
echo

Now that the build process only happens inside the container, you can trim down the build agent. For example, it does not need to have packages for all development environment, from Perl to Haskell. All it needs is Docker (and of course the client software to run as a build agent) and thereby massively reducing the provisioning and maintenance effort.

Let’s go back to the illustrative use case mentioned earlier. If an engineer in your team is inspired to evaluate Go, you do not need to modify your build infrastructure. Just ask them to provide a suitable Go development container (or reuse an existing once such as google/golang) and prepare that phoenix-like bootstrapper scripts. The same goes for the new intern who prefers to tinker with Rust instead. No change in the build agent is necessary! Everyone, regardless the project requirements, can utilize the same infrastructure.

In fact, if you think through this carefully, you will realize that all those Linux build agents are not unique at all. They all have the same installed packages and no agent is better or worse than the others. There is no second-class citizen. This is democracy at its best.

Parametrization and Resilience

Knowing the build number and other related build information is often essential to the build process. Fortunately, many continuous integration systems (Bamboo, TeamCity, Jenkins, etc) can pass that information via environment variables. This is quite powerful since all we need to do is to continue to pass that to Docker. For example, if you use Bamboo, then the invocation of docker needs to be modified to look like (notice the use of -e option to denote an environment variable).

docker run -v $SOURCE_PATH:/source \
  -e bamboo_buildNumber=${bamboo_buildNumber}\
  $CONTAINER sh - c "/source/phoenix/build.sh"

medikitAnother side effect of this Docker-based build is the built-in error recovery. In many cases, a build may fail or it gets stuck in some process. Ideally, you want to terminate the build in this situation since it warrants a more thorough investigation. Armed with the useful Unix timeout command, we just need to modify our Docker invocation:

TIMEOUT=2m
echo "Triggering the build (with ${TIMEOUT} timeout)..."
timeout --signal=SIGKILL ${TIMEOUT} \
  docker run -v $SOURCE_PATH:/source \
  $CONTAINER sh - c "/source/phoenix/build.sh"

By the way, this is the reason why there is an explicit docker pull in init.sh. Technically it’s not needed, but we use it a mechanism to warm up the container cache. This way, the time it takes to initially pull the container will not be included in that 2-minute timeout.

With the use of timeout, if the Docker process would not complete in 2 minutes, it will be terminated with SIGKILL, effectively aborting the whole step at once. Since the offending application is isolated inside a container, this kind of clean-up also results in a really clean termination. There is no more server hanging out doing nothing because it was not killed properly. There is no stray zombie process eating the resources in the background.

Summary: Use Docker to modify the build agent to be a realm where your phoenix lives and dies. After that, turn every build process into a short-lived phoenix.

Tags:

The most recent Shellshock, a vulnerability in the popular shell bash, got me to evaluate again the unique setup on Ubuntu/Debian. In this setup, script execution is not handled by bash, this job is carred out by dash, the Debian Almquist Shell. Meanwhile, bash is still used for the interactive shell since dash does not have autocomplete and history support.

One advantage of this setup is that you start to write your script taking into account that it will not be executed by bash only. This makes sense, it requires just a little effort to avoid certain bashisms and stay compatible to the POSIX syntax. I myself was pretty ignorant of this, assuming that bash is ubiquitous. After using dash for a while, I learn a couple of new tricks and I am happier than my scripts follow this standard best practice. While dash is more efficient in its execution, in most cases the difference is negligible and that was not my primary concern.

Linux users can get dash easily, it is one apt-get or yum install away. For OS X, it is easy enough to build it from its source, e.g.:

DASH_VERSION=0.5.7
DASH_FULLNAME=dash-${DASH_VERSION}
DASH_TARBALL=${DASH_FULLNAME}.tar.gz
DASH_DOWNLOAD=http://gondor.apana.org.au/~herbert/dash/files/${DASH_TARBALL}
rm -rf ${DASH_TARBALL} ${DASH_FULLNAME}
curl -L ${DASH_DOWNLOAD} -o ${DASH_TARBALL}
tar -xzf ${DASH_TARBALL}
cd ${DASH_FULLNAME}
./configure && make
sudo make install
fish

As for the interactive shell, my favorite these days is fish. It does not support every single feature of bash but it works very well. If you are desperate, you can still workaround what you miss from bash. And make sure you check out Oh My Fish! as well.

Whether you prefer bash or dash or fish, Unix shells are always fun to explore!