You’ve got your orange thing, I’ve got my square

If you haven’t noticed yet, almost every one of my blog posts starts out as a twitter conversation. There must be a pattern there. :)

So, today, I’m going to address the bigger issues around the question I first asked, which started off the conversation,

seems inefficient: emulating a DOM (built up w/ server-side JS)…which then has to serialize into a html string, transfer, then be rebuilt.

though it turns out I had to rephrase it, because I got answers to a different question than what I asked.

rephrase: i get why server-side js is awesome (code reuse, prog enhnc, etc). why a DOM approach instead of shared/reusable templating?

Tale of Two Questions

For the sake of sanity and clarity, let’s break down the conversation into the two fundamental questions, and address them separately:

  1. Why is server-side JavaScript helpful for front-end/UI tasks?
  2. Why would you use the DOM approach to the UI in server-side JavaScript as opposed to a templating approach?

Server-side UI

There’s some really exciting things you can do with the UI of your web application with JavaScript. This is probably not news to anyone who hasn’t had their head under a rock for the last 5-8 years.

But what role (if any) can server-side JavaScript have in the question of how to use JavaScript in the front-end of your application? Certainly that seems like an oxymoron to be doing JavaScript in your “back-end” primarily for the benefit and use of your “front-end”.

In fact, this is exactly why I’ve been banging the loudest drums I can find to try to define the “middle-end” (that is, what sits in between the front-end and back-end… the middle-end!) of the web application stack.

Specifically, there are a number of UI-related tasks which frequently occur in both places — the browser and the server. For instance: presentation “construction”, data/form validation (or data binding), etc. These tasks, along with others, I have defined as the “middle-end”.

Moreover, concepts such as “progressive enhancement” are heavily related — that is, the idea of progressively enhancing the presentation layer capabilities depending on the capabilities of the client making the request.

Where server-side JavaScript offers some really exciting answers is:

  1. Code Reuse: If I can write my data validation code, or my UI construction code, etc, once, in JavaScript, and then reuse that code in both the browser and the server, I’ve significantly improved the maintainability of my application by achieving DRY (don’t repeat yourself).
  2. Progressive Enhancement: If I can take advantage of JavaScript’s ability to build out my page’s UI dynamically, I can achieve some really awesome things in the browser. But what about browsers which don’t have JavaScript (enabled)? The typical approach has been to serve a more basic version of the content to such less-capable browsers.

    But what if I could get most of the awesome benefits of JavaScript-powered presentation and not be affected by the browser’s inability to do JavaScript? What if I could run all my awesome JavaScript to build up my page (or even to dynamically alter it) on the server, and just send the HTML result of such operations over to a less-capable browser? To the user of such a browser, they might see things updating slower and a little less gracefully (with full page-refreshes), but they’d get most of the benefits of a dynamically capable page experience without having any JavaScript at all.

    This could be a huge win for progressive-enhancement advocates, because it significantly reduces the amount of work that a site must do to provide some sort of alternate content experience for JavaScript-less browser users.

These are not the only interesting ways that server-side JavaScript could be used to really revolutionize the web application process, but they should be more than enough to answer the question: “Why is server-side JavaScript helpful for front-end/UI tasks?”

To DOM or not to DOM

The meatier question in play here is, if we agree that server-side JavaScript should be used to improve the web application stack, should we choose a DOM-based approach to building out the UI on the server, or should we use a string-templated approach?

I’ll cut to the chase. The short answer is… BOTH!

But the nuances of that assertion are what we’ll now focus on for the rest of this post.

Oranges & Squares

Let’s first define more specifically what we mean by each distinct approach:

  1. DOM approach: By this, I mean that you build up your page by explicitly creating each element as a node in a DOM tree. For instance, if I want to add this <ol></ol> ordered-list to my page, I first create an “ol” node, then I create an “li” node, then I append some text to the “li” node (as a “text” node of course), then I add the “li” node to the “ol” node, etc. Once my “ol” node is ready, I then append my “ol” node as a child element of some part of my page, like the current section I’m in.

    Such an approach might look like this (using jQuery):

    var $ol = $("<ol></ol>");
    var $li = $("<li></li>");
    $li.text("Some text");
    $ol.append($li);
    $("body").append($ol);
    

    With this code, our JavaScript executes and programmatically builds up our DOM elements, eventually adding our “ol” element to the end of the “body” element.

  2. Template approach: By contrast, here I mean that to build up the “ol” list as described above, you instead create a string like “<ol>”, followed by a “<li>”, then some text, then “</li>”, etc. Once the entire string of markup is done, then you insert that entire string into the page (or part of the page), either as the full HTML or as the .innerHTML of some element.

    Of course, I’m not suggesting that you literally have to manually add strings together with variables and such. That’s why we have template engines that do that for us. So, instead, you have a template like this:

    <ol>
       <li>{some_text}</li>
    </ol>
    

    With this “code”, our templating engine grabs the {some_text} part and replaces it with the value of that variable, and returns us the full constructed string, so we don’t have to mess with all the string operation details.

Zoom In

So, let’s examine the differences between these approaches.

The first most obvious difference is what the “code” looks like that generates something visible in our presentation layer. In the DOM approach, large chunks (if not all) of your presentation layer is written in JavaScript code, usually tucked in with all the other front-end JavaScript. In the templating approach, almost all of your presentation layer is in HTML markup (as a string), usually in separate template files.

The result of building your UI in the DOM approach is an actual DOM tree (a physical nested object heirarchy in memory). The result of building your UI in the template approach is just a string of HTML markup.

Let’s be clear about something: a DOM is required to render something the user can see in the browser. So, the string of HTML markup that you get from the templating approach must of course eventually get parsed into a DOM tree.

That presents an obvious question: if a DOM is eventually required to actually show the presentation layer, isn’t it more efficient to just start out with (and stay in) the DOM tree, rather than going with string manipulation as a first stage of the process?

This is a fine and fair question. I hope I can quickly distill a few thoughts to suggest why I’m not so sure that the obvious is entirely correct — there’s more here than meets the eye.

Designers

Firstly, I’d suggest that there’s a pretty strong precedent for building HTML pages in… HTML. Designers, IDEs, and most web developers have learned and in many cases mastered the process of writing HTML markup. To my knowledge, almost none of the web development IDEs and typical processes are centered around an “(DOM)object-oriented” building of pages — in other words, I’ve never personally seen an IDE where when I look at source-view of a page, I’m messing around in a nested tree structure and adding and removing nodes.

Moreover, the very concept of a string of HTML markup actually conceptually representing a hierarchy of parent and child nodes is a very programmer-centric view. It’s like the difference between saying “The Quick Brown Fox” is a sentence phrase versus saying that, as a programmer, that’s actually an array of individual characters. Both are true, but one view is more semantic and the other is more programmatic.

So, stylistically, you can make a strong case (although it is ultimately preference-driven) that building out HTML pages is more natural and well-established as HTML markup as opposed to JavaScript code. From purely a role-separation standpoint, it’s perhaps more natural for UI designers (and the markup/CSS role) to be able to work in simple HTML markup than they will in code (whether that be JavaScript code, or some other back-end language code like PHP, Java, .NET, etc).

Just because we layer on a simple templating engine on top of the HTML markup to represent the concept of dropping in a dynamic data value in a certain location, doesn’t mean we’ve changed the paradigm from working with HTML markup to working with DOM nodes. Of course, some “templating engines” are far more complex, including all kinds of “programming-like” stuff, like conditionals, loops, function calls, arithmetic, boolean logic, etc. The more of that junk you cram into the “templating language” of your choice, the more like programming your presentation layer construction becomes.

As should probably be obvious, my strong preference is at the far simple end of the templating spectrum, where a bare minimum of anything that looks like “programming” is going on inside my HTML markup. I think that’s backed up by lots of precedence in “separation of concerns”, and I believe a lot of people agree. But there are others who prefer to write code to build up their presentation layer. To each his own, on that, I guess.

Down To The Wire

The other major difference, when we factor in such tasks as presentation layer building occurring on the server, is how the stuff that gets sent over the wire is prepared for the trip from the server to the browser.

In either the DOM or template approach, if the browser doesn’t have JavaScript enabled, the entire process of building the presentation layer occurs on the server, and what gets sent over the wire is always and only a string of HTML markup. The exact same page built with the DOM or template approach will end up being exactly identical when it squeezes down to fit through the pipe from server to browser — a string.

If we accept that in that base case, what transfers over the wire must be a string, then the question above of what’s more efficient (DOM or string) gets flipped. If the in-memory representation of the DOM (on the server) must first be down-converted to a string to be sent over the wire, this will tend to be less efficient than if the presentation layer is already (and only) just a string, ready to send.

Now, what about the case where JavaScript is available on the browser? Assuming then that we’re in the part of a task where we want to off-load our presentation construction to the server, what gets sent over the wire is still going to be pretty much equivalent (though not the same). For the DOM approach, we’re going to send JavaScript, probably in the form of .js files, and in the template approach, we’re going to send HTML templates (strings), probably in the form of template text files. In each case, the files will be normatively cacheable to about the same extent.

Run Time

Both approaches are going to need JavaScript though, not just the DOM approach. The template approach is going to need a “template engine” that can do the same tasks that it did in the on-server scenario — do string operations and combine the string content with dynamic data. The DOM approach is going to need JavaScript logic to create, insert, append, and modify the in-memory DOM tree.

I don’t have conclusive statistics to suggest that either of those two JavaScript code logic paths are going to be substantially faster or more efficient than the other. String operations have, especially in recent JavaScript engines, become exceedingly optimized and fast (compared to really bad string performance in previous generations). Moreover, an intelligent template engine is going to “cache” the compilation of a template (that is, all the string parsing) into repeatedly-executable code, so the costly regular-expression-like parsing of the template string approach is going to be drastically reduced, considering the whole life time of the page view.

But let’s examine what happens at the moment when the presentation layer is about to cross the bridge and actually be added to the live DOM of the page. In other words, let’s compare whether it’s more efficient to inject a single ready string of markup via .innerHTML or to append a sub-tree of nodes to the main DOM tree. Fortunately, a jsPerf test exists to help answer this question: DOM vs. innerHTML.

I ran this test a few times in a few different browsers, and got conflicting results. In Webkit (Chrome/Safari), I got that the DOM insert was quicker (30-50% so) than the .innerHTML insert. But in FF (and to some extent, IE), I got the opposite. Opera was 40% faster at the .innerHTML than at the DOM insert. So, those results are mixed to say the least. I do however think it’s fair to say that as we would go back in time to older generations of browsers, the .innerHTML insert would tend to be a little faster, since most of the attention and focus in recent optimizations of the DOM has been in the DOM append operations. This is a conjecture at best on my part, but I think it’s not terribly inaccurate.

Ideal vs. Realistic

And let’s look closer at what’s going on in this test. The DOM insertion test case uses a DocumentFragment, basically a temporary non-rendered DOM sub-tree, to append the children nodes to. Why go to this effort? Because it’s common knowledge that the absolute slowest part of anything you can do in JavaScript is the actual DOM manipulation against the live DOM of the page.

The reasons for this are varied, but include even things like how DOM manipulations require updating of internal live nodeList collections, force the rendering engine to do reflow and repaint, etc. So, if we can do all our DOM manipulation in a disconnected sub-tree without such side effects, we’ll drastically reduce the cost of DOM insertion. This is a well known and cited optimization technique that has very tangible effects. If that same jsPerf test were run without the use of DocumentFragment and were just inserting the children directly into the live DOM one at a time, the DOM insertion test case would have been hundreds or thousands of times slower across all browsers.

So, are the DOM approach and template approach tied or inconclusive in terms of run-time performance? Perhaps. But I’d say maybe not. The test above represents the absolute best-case optimization, where you have everything you need for the DOM ready in a fragment, and can do it in one insert. In the real world complexity, you may have cases where you need to do 2, 3 or even 10 inserts all around a page. Because the overhead of the live DOM insert is, in general, much higher, the more of them you do, the quicker that negative performance will add up, and will be slower than the .innerHTML inserts.

DOM > string

There are a few cases where some valid arguments can be made that modifying the DOM dynamically is easier or better than dealing with HTML strings.

For one, whenever we’re dealing with event handlers (which get bound directly to DOM nodes), if you start re-assigning the innerHTML of nodes (or the entire page), you’re likely to screw up your event handlers. This is a common and costly mistake. It can of course be mitigated by the established best practices of using event bubbling/delegation, where you have event handlers bound only to top level DOM elements (like the body, for instance), which listen for bubbled-up events from any child elements — when you do it this way, you can freely re-create your elements using either DOM manipulation or innerHTML, and not mess up the event handling.

However, let’s keep in context what this post was about. If we’re talking about doing DOM building of a page on the server, we don’t have browser event handlers in place, so this problem is moot.

Another argument for the DOM approach is the ability to use CSS Selector Engine searching to find elements in the DOM tree. There’s no doubt that Selector Engines are extremely powerful, and using them to traverse through a DOM is equally so. But, when dealing with server-side DOM building, it’s less likely that you’re going to be layering on modifications to your DOM where you need to search the entire DOM repeatedly. For instance, you may be using the DOM approach, and you may “attach” a widget to the server DOM representation, where it creates a new set of DOM nodes to represent itself. But it probably won’t need significant CSS Selector Engine functionality to parse through the entire DOM when doing so — it’ll probably just operate locally in one area of the DOM.

And even if you did need to be able to push in different content into different places of your “page” (attaching different widgets), the regular expression’s power over string searching should be equally capable for most tasks. Your template engine can use a layered approach, where you have sub-template place holders for each widget, and you can tell the template-engine to build out each sub-template separately.

In short, there are cases where we may be more used to tasks (which we normally do in the browser) by using the DOM approach. But there’s not much if anything that can only be done with DOM and is impossible or impractical with string templating. It may take a slightly different mental shift to do so, but I believe templating can be every bit as useful (and perhaps even more so) as the DOM approach.

Oranges AND Squares

Let me wrap up this post with some final thoughts.

Firstly, as you’ve seen, there’s some awesome benefits to using server-side JavaScript to build your presentation layer UI. But more importantly, there’s more than one way to use server-side JavaScript to build it. You can use the DOM, you can use string templating, and you can do both in the same page as you see fit.

I’d just encourage you to realize that the DOM is merely a concept that the browser forced upon UI developers — it’s not actually the most natural way we deal with HTML for the presentation layer. The DOM API in the browser has some really ugly and non-performant warts, and yet some people are going to great efforts to emulate the DOM on the server. I wonder why we’re trying to shoe-horn the DOM into the server to accomplish our tasks, rather than using a string-templating approach that can work pretty naturally and capably in both places without any extra emulation layer?

But even if you’re unconvinced that the DOM is sort of an abnormal approach for the server-side presentation layer building, at least accept that both the DOM and templating approaches will serve you well. Pick which one (or both) is best for each task, and for each part of your development team.

Don’t just assume that because YUI or jQuery are DOM-centric, that this is the only way to achieve server-side JavaScript presentation layer logic. A good, intelligent templating engine will give you all the same tools, and may just be a more natural and efficient fit in the long run.