On Script Loaders

Comment

Two recent projects have come out that attempt to address the “dynamic script loader” use case: HeadJS and ControlJS. Since I’m the creator of LABjs, a general, all-purpose, performance-oriented dynamic script loader that’s been around for about a year and a half now, and is well-known enough to be in use on several major sites, including Twitter, Vimeo, and Zappos, many people ask my opinions when new entries into this space arise.

I’ve been hesitant to rush to negative judgement on either one, because I believe it’s important to encourage experimentation and progress in this area, for the sake of the greater web. But I do think it’s important to shed a little bit of light onto both projects and explain some concerns I have with their approach. I respect the authors of both libraries, so I hope they will take my ramblings here as constructive criticism rather than an attack.

ControlJS: backstory

ControlJS comes from the immensely experienced and talented Steve Souders, who leads performance efforts for Google. For readers who aren’t aware, when I first was building LABjs back in 2009, Steve stepped in and offered some very helpful and timely advice and collaboration, and I credit much of LABjs’ success since to Steve’s wisdom and experience.

Recently, though, Steve has focused in a different direction than was necessarily the focus back then. His focus is now more aimed at delaying all script loadings until after the rest of the page content has loaded, whereas LABjs’ goal was to simply allow easy parallel loading of scripts (instead of blocking behavior) right along side the rest of the page content. In my opinion, there are some scripts which are less important, like Google Analytics, social sharing buttons, etc. And I whole heartedly agree with “deferring” that code’s loading until “later”, to not take up precious valuable bandwidth/connections/cpu.

But there’s also a lot of JavaScript that is just as important as the content it decorates, and to me, the idea of delaying that code by a noticeable amount will lead, in general, to proliferation of a distasteful visual effect I call “FUBC” (Flash of Un-Behaviored Content). In other words, we’ll see pages that flash up raw text content, only to swap out a moment or two later into the widgetized and JavaScript stylized version of the page, where the content is in fancy tab-sets, modal dialogs, etc.

To be clear, any dynamic script loader can create this effect unintentionally, including LABjs. But what ControlJS appears to be intentionally doing is delaying scripts even longer past when content is visible/useable, which will exacerbate this FUBC problem quite a bit.

There are of course ways to mitigate this, using default/inline CSS rules and <noscript> tags (a technique I talk about on the LABjs site), but if you hide all the raw content with CSS and don’t re-display it until the JavaScript is present, you lose ALL of the benefit of deferring that JavaScript logic to let the content load quicker. The goal is admirable, but I think it will end up being neutral or worse UX for most sites.

This is a tenuous and difficult UX balance that sites need to consider carefully. I do not endorse Steve’s suggestions, that basically all sites should move to this model. It makes sense for some of them, if they intentionally design their UX that way. But it’s by far not optimal for a lot of sites, and using his suggestions will trade out performance for really sub-optimal user experience during page-load if not done with caution and reserve.

So, that is the context under which Steve presents us ControlJS. It’s an attempt to create a script loader that more closely models his view of how page-loads should work (that scripts should all defer loading and execution until all content is finished). If you agree with him, and aren’t worried about FUBC (or have already carefully thought about and designed around it), ControlJS is something to at least consider. But if you’re just planning to drop in ControlJS to an existing site, I think this is possibly a big mistake and the wrong direction for the web to head in.

So, I simply chose to disagree with Steve on this point, and we’re now focusing on different goals.

ControlJS: approach

In addition to my UX concerns with Steve’s approach to page-load optimization that’s embodied in ControlJS, I have some problems with the functional approach he’s taken as well.

User-agent

First, ControlJS relies on browser user-agent sniffing to choose different loading techniques for different browsers. I have been a very vocal critic of user-agent sniffing, even to the opposition of brilliant guys like Nicholas Zakas.

I’m not going to rehash the whole debate of user-agent sniffing. I simply won’t use it, and I think most people agree that it’s a bad choice. Feature-detection is far preferable. In between the two, but I still think better than user-agent sniffing by an important amount, is browser-inferences. LABjs uses a couple of browser-inferences (basically, testing for a feature known to be characteristic of only one or a family of browsers), very reluctantly as a temporary stop-gap until such a time as the browsers support dynamic script loading use-cases with feature-testable functionality. ControlJS makes no such attempt to be robust or future-thinking on this topic, simply relying on basic user-agent sniffing. This definitely worries me.

Moreover, I’ve been heavily engaged in trying to petition browsers and the W3C to support native functionality that supports the dynamic script loading use-cases, in a feature-testable way. If you’re unfamiliar with that proposal/effort, take a look at this WHATWG Wiki Page: Dynamic Script Execution Order.

Mozilla (starting with FF 4b8, due out in a few days) and Webkit (very shortly in their Nightlies) have implemented the `async=false` proposal made there, and done so in a way that’s feature testable. LABjs 1.0.4 was released a few weeks ago with the new feature-test in it to take advantage of that new functionality when browsers implement it. Eventually, the goal is to deprecate and remove the old hacky browser inferences (and the old hacky browser behavior it activated) in favor of this new (and hopefully soon standardized) behavior.

I asked Steve several times to join in the efforts to advocate for this new feature-testable native functionality to be standardized and adopted by browsers, which I believe will DRASTICALLY improve script-loaders’ ability to performantly load scripts into pages. He politely declined to participate, and suggested my efforts were misguided. And instead, he released ControlJS using old and sub-optimal user-agent sniffing in its place. This is obviously not how I hoped things would progress.

Preloading

During the development of the original 1.0 release of LABjs back in 2009, I consulted several times with Steve on various trade-offs that had to be made to get a generalized script loader to address all the various use-cases.

The biggest problem I faced was that some browsers (namely IE and Webkit) offered no direct/reliable way to load scripts in parallel but have them execute in order, which is important if you have dependencies (like jQuery, jQuery-UI, and plugins, etc). This is especially difficult to do if you are loading any or all of those scripts from a remote domain (like a CDN), which is of course quite prevalent on the web these days.

I developed and tested a trick I call “cache-preloading” (something which is now getting a lot of attention from various script loaders), which was basically a way to handle this. It was openly admitted to be a hack, sub-optimal, and hopefully something that would be eventually removed from LABjs.

There are various ways to approach this trick, but the fundamental characteristic is that you “preload” a bunch of scripts into the browser cache without execution, and then you make a second round of requests for those scripts, pulling them from cache, to then execute them. The reason this trick is important to consider is that it has a rather large (and potentially fatal) assumption attached to it: that all the scripts being requested are served with proper and future expiration headers, so that they can actually be cached.

Importantly to this post, relying on this assumption was very worrisome to Steve in our discussions back then. He asserted that as much as 70% (Update: maybe 38%, 51%, …) of script resources on the internet are not sent with proper caching headers (or none at all), and so for all of those scripts, if they are loaded with a script loader using a “preloading” trick, the first load won’t successfully cache, and the second request will cause a second full load. Not only is that terrible for performance, depending on the script loader’s internal logic, this assumption failing can create hazardous race conditions.

In addition to the possible double-load, Steve was also worried that even if a script was properly cached, the delay for the second cache hit could be noticeable and bad for large scripts. Again, large script files are very common these days, with the proliferation of advice from people like Steve who suggest concatenating all files into one file (to reduce HTTP requests).

After much discussion and back-and-forth, I reluctantly decided that Steve’s concerns were valid, and so I added more complexity to LABjs (increasing its size and introducing several ugly hacks) to try to abate this problem. We tenuously agreed that the likelihood was that most of the scripts that are being served with improper headers are being self-hosted (thus on local domains), and that remote domains (like CDNs) would be much more likely to be correctly configured.

I devised a solution where LABjs uses XHR to “preload” local scripts, and only falls back to the “cache-preload” trick for remote scripts. In addition, I gave people the ability, through config values in the LABjs API, to easily (with a single boolean value) turn off either or both of those preloading tricks, so that developers would be much less likely to accidentally trip over this problem if they indeed had to load a script that had improper caching behavior.

I’m sure you can imagine my surprise when, this morning, I cracked open Steve’s code, and I see that he’s now using the “cache-preloading” tricks as the only method, without XHR. In other words, he was really concerned about this problem when he helped me design (and complicate LABjs), but now it’s not a concern of his. He briefly mentions the assumption of cacheability off-handedly in his blog post, buried in a lot of other text.

I haven’t seen any research to suggest a radical shift since late 2009 that has most or all script resources now being properly cacheable. So I still consider the concerns that Steve voiced to be quite real and important.

Again, I always considered the “cache-preload” to be a hacky last-ditch fallback, and always have intended to remove it as soon as browsers provided a better solution (that’s happening now, slowly but surely). So, I’m quite concerned to say the least that ControlJS (and HeadJS and others) are latching onto the “cache-preload” as their primary (or only) means of handling parallel-load-serial-execute.

Not only are they white-washing over the cacheability assumption, They’re completely ignoring the recent movement by Mozilla and Webkit to implement the far-preferable `async=false` functionality. I consider this quite unfortunate.

Brittle Cache-Preloading

One last note on the “cache-preload” tricks: LABjs’ version of the “cache-preload” trick was to use a non-standard and undocumented behavior of certain browsers (IE and Webkit) that would fetch a script resource into cache, but not execute it, if the script element’s `type` value was something fake like “script/cache” (instead of “text/javascript”).

However, the HTML spec now says that such resources should NOT be fetched. So, Webkit (about a month or two ago) dutifully obeyed the spec and patched their browser to stop fetching them. This means that LABjs’ “cache-preload” trick broke entirely in Webkit nightlies. Now, thankfully, Webkit is moving rapidly to adopt the more preferred `async=false` in its place, so hopefully LABjs won’t be broken in any major Webkit-based browser release.

But there’s a SUPER IMPORTANT lesson that must be learned here. LABjs got by with non-standard behavior for awhile. But browsers are quickly catching up to the spec/standards. It was almost disastrous for LABjs, had `async=false` not gotten the attention it did, to die a public death because of relying on hacky non-standard behavior.

But ControlJS, HeadJS, and many other script loaders like them are doing the same thing. They aren’t necessarily using the exact same trick as LABjs used, but they are pinning their entire loading functionality on hacky, non-standard behavior. ControlJS uses the <object> preloading hack for some browsers and the `new Image()` hack for other browsers.

Essentially, they’re relying on the fact that these browsers will fetch the script resource content into cache but not execute it because the container used to do the fetching doesn’t understand JavaScript. I don’t know about you, but this sounds to me dangerously like the spec statement of telling browsers not to fetch content with a declared MIME-type the browser can’t interpret. How long do you think it’ll be before Webkit, Mozilla, Opera, or IE patch to stop fetching/caching content via <object> or `Image()` that they can’t interpret?

Some have argued that <object> and `Image()` will always have to fetch such content, because the browser can’t know the URL won’t return valid image or object content until after it receives the response. This may be true, the browser may always have to fetch it. But the browser might chose to discard the contents (and not cache it) if it sees a content-type that it won’t consider valid for the requesting container. If I were on a browser dev team, and were coding with the intent of following the spirit of the spec, that’s exactly the logic I’d implement.

And I’d especially do that, even though it may break script loaders, because the spec process (and browsers) are already starting to implement a more direct and reliable approach: `async=false`.

DOM-ready

I think it’s ironic that JavaScript modules for loading scripts asynchronously have to be loaded in a blocking manner. From the beginning I wanted to make sure that ControlJS itself could be loaded asynchronously.

I understand and empathize completely with Steve’s sentiment here. He and I both agreed from day one of LABjs that it’s unfortunate that you have to “load some JavaScript so that you can load more JavaScript”. But at its heart, this is how bootstrapping works, and it’s a reality.

However, I believe Steve has completely glossed over a really important point (and it kind of ties back to my earlier section on the UX of FUBC): if you asynchronously and dynamically load the loader, and don’t take special precautions, the loader has no way of knowing (in some browsers, including FF3.5 and before) if the page’s DOMContentLoaded (aka “DOM-ready”) has passed or not.

While the script loader doesn’t really need to care to much about DOMContentLoaded, some of the scripts that you may be loading do very much care. As I wrote about last year regarding jQuery, DOM-ready, and dynamic script loading, it’s very common that people write jQuery code that looks like this:

$(document).ready(function(){
   // I know the DOMContentLoaded/DOM-ready event has passed! wee!
});

The problem is, jQuery can’t reliably detect DOMContentLoaded in FF3.5 and before (and a few other obscure browsers) if jQuery itself is not loaded statically/synchronously (that is, in a blocking way, so that it’s loaded before DOMContentLoaded can pass). So, if you just blindly load jQuery dynamically, and it happens to finish loading after DOMContentLoaded/DOM-ready has passed, any code you have in your page that looks like that snippet above will just sit forever waiting, and never fire!

So, LABjs takes advantage of the fact that you are almost certainly going to load “LAB.js” file with a normal blocking script tag (I know, counter-intuitive, right!?). Inside of LAB.js, at the very end, is a small little snippet that detects if the page doesn’t properly have a `document.readyState` property on it (what jQuery uses for detecting DOMContentLoaded in those browsers), and if so, it patches the page.

This has the effect that a few hundred milliseconds later, when jQuery finishes loading (if you’re using jQuery of course!), even if DOMContentLoaded has already passed, jQuery will properly see the right state, and your code will fire immediately. The page-level hack looks like this:


// required: shim for FF <= 3.5 not having document.readyState
if (document.readyState == null && document.addEventListener) {
    document.readyState = "loading";
    document.addEventListener("DOMContentLoaded", handler = function () {
        document.removeEventListener("DOMContentLoaded", handler, false);
        document.readyState = "complete";
    }, false);
}

This is an unfortunate page-level hack that's necessary for FF3.5 and before, for jQuery to be properly loaded dynamically and still have code like that snippet above operate as expected. You can see the code is pretty small and compact, so including it in LABjs doesn't hurt the size/complexity too much. But it works because LABjs typically is loaded statically.

So, does that mean LABjs can't be loaded dynamically and still have jQuery and DOM-ready code work? No! It just means that the page-level hack in that snippet needs to be part of whatever little "bootstrapper" code you use to dynamically load LABjs to a page. And it means that THAT code that you use as a bootstrapper for LABjs must itself load statically (like in an inline script-block, etc).

In fact, I advocated a little while back exactly how to dynamically load LABjs and still preserve this whole DOM-ready business with this snippet.

What concerns me about ControlJS, HeadJS, and almost every other loader out there is that they completely ignore this important point about DOMContentLoaded detection for FF3.5 and below. I know for me, I still regularly use LABjs to dynamically load jQuery, and I still use `$(document).ready(...)` to safely wrap code, and I still support FF3.5. So for me, this DOM-ready protection code is important. Whether it appears in the loader code itself, or in the snippet of bootstrapper code, I think those loaders are really missing something important if they don't include that snippet (or something like it).

Invalid Markup

This post is already getting really long (again! I can't be short-spoken even if I try!). So I'm going to at least keep this final section brief, and try to wrap up quickly.

ControlJS is able to achieve its "deferral" of script loading and execution (until after window.onload in fact!) by suggesting that you must change all your script elements in your code to be unrecognizable by the browser loading mechanism.

You change a script tag's `src` attribute to be `cjssrc` instead. This is an invalid attribute name, and will be ignored by the browser. But it also invalidates your markup. I consider this to be a bad-practice. At least the attribute could have been `data-src` or something, so that it fits with valid spec attribute naming.

Secondly, ControlJS requires that you change a script block's `type` value to "text/cjs" (from "text/javascript"). I have two problems with this approach. Firstly, as we discussed above, the spec and standards bodies are moving to explicitly de-supporting invalid types. What if sometime soon, the spec says that any element with an unrecognized type should not even be added to the DOM but should be entirely ignored?

Or what if some browser just interprets what the spec currently says to mean that? If I were a browser developer, I could easily argue that the ignoring of such an element (not adding it to the actual DOM) would help improve the page's performance by taking up less memory and having fewer DOM nodes to inspect during DOM traversal/manipulation.

Also, he chose "text/cjs". This assumes future-wise that no MIME-type will ever take the name "text/cjs". Notice that LABjs at least chose a much different value, like "script/cache", which is far less likely to have future collisions. But Steve chose a "text/xxxx" format for the value, which fits closely with how such values currently are assigned, but also makes it more likely that there will be a conflict someday.

Moreover, in HTML5, the type value is now optional, and for performance reasons, I (and most people) now omit that from our markup. ControlJS will require us to go back and add `type` attributes to all our script tags, making the markup slightly larger. That's pretty minor, but it bugs me nonetheless.

Lastly, it is unclear from the code or the documentation what the expected/suggested behavior should be if I have some script tag elements in my markup that are marked-up to have CJS bindings, and other script tags that I leave alone. How will CJS loaded scripts interact (before or after) scripts that are loaded using the browser's native mechanism? If there's some reason I need to have a mixture of CJS and non-CJS script tags in my page, it will make the interpretation of my markup (specifically, the implied execution order of the script elements) a lot more confusing.

Summary

This is part 1 of this post. I'm going to make another post soon (part 2) where I shift my focus from ControlJS to HeadJS (and possibly other script loaders).

Again, I apologize to Steve (and to anyone else) if the tone of this post seems to be overly harsh. But I think it's important that the other side of the coin be put out there for developers to consider as they compare how ControlJS differs from something like LABjs.

Moreover, I reiterate what I've said a dozen times already: instead of creating more and different script loaders, I think a better use of our time would be to consolidate efforts in getting the browsers and the spec/W3C to give us a reliable native mechanism for managing dynamic script loading. Since `async=false` is already moving along nicely, I encourage and invite anyone who's interested to join that discussion.

I hope very soon that proper and simple handling of dynamic script loading will be very easy and straightforward, and well-conforming script loaders (which I can say LABjs will certainly be) will eventually be free from a lot of hacky, legacy junk that weighs them down. I call on all other script loaders to join that effort for a better script loading future. (ok, yeah, that was lame. :))