.

Tags:

There was a lot of discussion about the recent rejection of Github pull request workflow by Linus Torvalds, the father of Git (and Linux). Note that Linus does like Github, “great at hosting”. Unfortunately the submission via the pull request mechanic does not meet the standard of kernel contribution.

I’m all for lowering the barrier to contribute something useful to a software project. Sadly, these days I also witness a lot of “shoot first, ask later” attitude. Just because a project is hosted on Github, it does not mean you can fire your editor and register a pull request and be done with it. Often, the first thing I have to write to comment a pull request is (with the suitable hyperlink):

Please read the contribution guide.

Below I highlight a few good practices which should serve as a laundry list, should you want to send an improvement or a bug fix to the project maintainer. Some of them are just common sense, but then we live in a generation where common sense starts to become a rarity. In one way or another, you may observe that some maintainers are very picky about this because they need to live with what you have produced for the lifetime of the project, while you can just perfectly disappear the next day and won’t care about it.

Check first for a contribution process. Because of legal reasons, some organizations mandate a special arrangement, e.g. Contribution License Agreement (CLA). In other cases, few maintainers want the assurance that your work is really yours, i.e. not owned by your current employer. Licensing is also a tricky matter, make sure you are aware of the implication of whatever you share with an external project.

Next, observe how the past contributions were made. Is there an official guide which gives the step-by-step instruction? Or maybe it is not that formalized and thus do you have to consult the maintainer or the mailing-list? Does everything need to filed in the issue tracker? Do you need to format your patch in a certain way? Is there a ChangeLog file? By adopting the way the members of the project discusses and carries out the development, you will be easily familiar with the process.

Write a decent commit log, e.g. a model Git commit message. A short message like “Fix stuff” won’t help anyone. You may need to write in proper English and avoid e.g. txt lingo. Nowadays, many projects also require that the link to the bug/review is to be included somewhere. Look at the past commits and find the patterns there.

Follow the testing procedure. Sending a pull request which provokes tons of regressions will not impress anyone. Watch for all kind of tests, from unit tests, functional tests, code coverage, performance stress, and even coding style. Being careful with your contribution shows how you value the project and its future, you will earn some respect in due time.

Ask for some review as early as possible. Writing software is about iteration, it’s all continuous improvement and refactoring. Don’t wait until your stuff is fully ready before screaming for some feedback. Those who work on the project for a while might have a big-picture understanding which you don’t have. Showing a half-baked patch which can be polished as you learn is better than pushing a perfect improvement which later gets rejected by the maintainer.

Last but not least, remember that sharing is beautiful and it is great that you are willing to give back something to the community. However, do not forget that project maintenance is hard (that’s why you find a lot of rotten projects out there). Engage in a discussion, share responsibly, and surely world domination will be in our hand someday!

But I have no doubt, one day the sun will come out.

.

Tags:

After the fun distribution charts of statements and keywords in popular JavaScript libraries, it is time for another metrics analysis. For a while, I was wondering how JavaScript developers come up with a variable name, function name, and other identifiers. Is it just few characters? Is it not that short? Is it always descriptive? The following script idlen.js (to be executed with Node.js) uses the parser from Esprima to dump all the identifiers, excluding the duplicates, of each file in its its corpus of libraries (for the benchmark suite).

var fs = require('fs'),
    esprima = require('esprima'),
    files = process.argv.splice(2);
 
files.forEach(function (filename) {
    var identifiers = {},
        content = fs.readFileSync(filename, 'utf-8'),
        syntax = esprima.parse(content);
 
    JSON.stringify(syntax, function (key, value) {
        if (key === 'name' && typeof identifiers[value] === 'undefined') {
            identifiers[value] = value.length;
        }
        return value;
    });
 
    for (var key in identifiers) {
        if (identifiers.hasOwnProperty(key)) {
            console.log(identifiers[key]);
        }
    }
});

With the help of Unix tools:

node idlen.js /path/to/some/*.js | sort -n | uniq -c

the distribution will look like the following diagram:

There is a long tail from 15 characters and above, which makes sense since an identifier that long will be likely special cases only (excluding this long tail region, the data roughly follows the expected normal distribution). The actual mean of the identifier length is 8.27 characters.

For the fun of it, the top 5 longest identifiers found among the libraries, with over 34 characters, are:

prototype-1.7.0.0.js   SCRIPT_ELEMENT_REJECTS_TEXTNODE_APPENDING
prototype-1.7.0.0.js   MOUSEENTER_MOUSELEAVE_EVENTS_SUPPORTED
     jquery-1.7.1.js   subtractsBorderForOverflowNotVisible
jquery.mobile-1.0.js   getClosestElementWithVirtualBinding
prototype-1.7.0.0.js   HAS_EXTENDED_CREATE_ELEMENT_SYNTAX

What kind of distribution do you get for your own JavaScript project?

.

Tags:

According to Albus Dumbledore:

“You see, we have not been able to keep a Defence Against the Dark Arts professor for more than a year since I refused the post to Lord Voldemort.”

There have been various recent events which showed how dangerous it is to pass the control to a proprietary binary, especially the one with a rather disastrous security track record. Zero-day Flash exploit was used to attack the big security firm, RSA. At the last Pwn2own, Chrome was exploited likely through the included Flash plugin, even with Chrome having its plugin sandboxed. Faux billing email from Vodafone was circulating, mostly targeting the Germans, with the attached malicious PDF which leverages Adobe Reader exploit to automatically download the real trojan payload. A huge number of Mac systems were lately infected by the Flashback botnet, originally started as a fake Flash installer and now taking advantage of Java vulnerabilities. While this was still hot, SabPub malware surfaced, this time using Word security hole to trigger a backdoor.

Security in the browsers needs to be hardened, otherwise the users will be left in the open. It’s no wonder that the future version of Firefox may have built-in support for plugins opt-in, also popular as click-to-play. For the current version of Firefox, a solution is to use Flashlock add-on, Flash content in the web page will be blocked and not played immediately, rather an explicit click from the user is needed to activate it. For those with Safari, there are ClickToPlugin and ClickToFlash which have the similar functionality.

As for Google Chrome, the opt-in feature is available built-in. Go to the Wrench menu, Settings. From the settings interface, choose Under the Hood, scroll to the Plug-Ins section, and simply choose Click to play instead of Run automatically. From now on, Flash and other plugins will be forced to stop. Only if you think it’s legitimate and click on it, then the plug-in will run.

As a bonus, using this opt-in feature somehow improves the browsing experience because all those annoying Flash ads cease to disrupt the actual business of information consumption.

.

Tags:

After you’ve seen the chart of JavaScript keywords distribution, it’s time to go a bit deeper to the syntax level. This time let’s find out the most popular JavaScript statements. The specification lists 15 different types of statements, with the iteration statements having four possible variants. Like the previous attempt, using Esprima and its corpus of libraries (for the benchmark suite), I got the following chart:

For all intents and purposes, I also throw VariableDeclaration and FunctionDeclaration into the analysis. For the latter, the difference between a properly hoisted declaration and a declaration inside e.g. ForStatement is not taken into account, which is likely just fine.

Again, real-world applications can show a different chart. If you are interested in running the analysis on your own code, use the following quick tool statement.js (utilizing Esprima package):

var fs = require('fs'),
    esprima = require('esprima'),
    files = process.argv.splice(2);
 
files.forEach(function (filename) {
    var content = fs.readFileSync(filename, 'utf-8'),
        syntax = esprima.parse(content);
 
    JSON.stringify(syntax, function (key, value) {
        if (key === 'type') {
            if (value.match(/Declaration$/) ||
                value.match(/Statement$/)) {
                    console.log(value);
                }
        }
        return value;
    });
});

and run it with Node.js as follows:

node statement.js myapp.js mylib.js others/*.js | sort | uniq -c | sort -nr

How’s the statement distribution in your application?

.

Tags:

For the next few weeks, don’t expect much updates on this blog. Within few hours I am scheduled to take a flight SFO-HGK-SIN-CGK. Unless something is going terribly wrong, I’ll be in Indonesia to handle some important matters. I will not maintain a strict radio silence, but since I plan to leverage the opportunity to keep my focus, I may not be super responsive at times.

Just like the last time, bracing for the impact of tasty food (putu, martabak, sponge cake, sate kelapa, …).