.

Tags:

In the spirit of making our web browsing activities more secure, often times we need to be aware of all the resources and assets being loaded by our important web sites (Internet banking, HR portal, and the likes). Fortunately, it is rather easy to programmatically inspect which JavaScript libraries are used by those sites.

Using PhantomJS, the headless browser based on WebKit, we just need a simple script which loads a web page and then check its content. PhantomJS has the ability of executing any JavaScript code within the web page context, this permits a simple check of the existence of a particular library. Here is a 30-line script which demonstrates this approach:

var page = require('webpage').create(),
    system = require('system'),
    address;
 
if (system.args.length === 1) {
    console.log('Usage: libdetect.js url');
    phantom.exit(1);
}
 
page.settings.loadImages = false;
address = system.args[1];
console.log('Loading', address, '...');
 
page.open(address, function (status) {
  if (status !== 'success') {
    console.log('ERROR: Unable to load', address);
    phantom.exit();
  } else {
    setTimeout(function () {
      var jQueryVersion;
      jQueryVersion = page.evaluate(function () {
        return (typeof jQuery === 'function') ? jQuery.fn.jquery : undefined;
      });
      if (jQueryVersion) {
        console.log('jQuery', jQueryVersion);
      } else {
        console.log('This site does not use jQuery.');
      }
      phantom.exit();
    }, 2000);
  }
});

The principle is quite straighforward: set up the right page object, load the requested URL, and then evaluate a piece of code which extract the jQuery version via jQuery.fn.jquery. For more details, PhantomJS documentation can be quite handy to understand what the script does at every step.

If you run the script on a site like CNN, the outcome will be:

phantomjs libdetect.js http://www.cnn.com
Loading http://www.cnn.com ...
jQuery 1.7.2

For sites which do not use jQuery, we got a prompt like:

phantomjs libdetect.js http://news.bbc.co.uk
Loading http://news.bbc.co.uk ...
This site does not use jQuery.

To keep it simple, the script only checks for the popular jQuery but it should be easy to extend it to detect all other well-known libraries and frameworks out there. Now imagine using the script to crawl thousands popular web sites out there, we can come up with a distribution chart of various JavaScript libraries and versions. Any volunteers?

  • Duncan Wong

    I had an idea to create something similar, but I wasn’t sure how feasible it was. The core of the idea was to use your other library, Esprima, to analyze the AST to produce a signature of the libraries (down to the version).

    I built a prototype, but couldn’t get around how to normalize for minification :(

  • tomByrer

    > Now imagine using the script to crawl thousands popular web sites out
    there, we can come up with a distribution chart of various JavaScript
    libraries and versions.

    Someone already has for jQuery 4 months ago, likely with more sites you & I could crawl together: https://github.com/h5bp/html5-boilerplate/pull/1327#issuecomment-14857390

    But I still think this is a good tip…

  • Matthew Kastor

    Excellent article. :D