<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>getiblog &#187; javascript</title>
	<atom:link href="http://blog.getify.com/tag/javascript/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.getify.com</link>
	<description>javascript, performance, and ui musings</description>
	<lastBuildDate>Thu, 15 Dec 2011 17:37:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>&#8220;(pre)Maturely Optimize&#8230;&#8221; revisited</title>
		<link>http://blog.getify.com/pre-maturely-optimize-revisited/</link>
		<comments>http://blog.getify.com/pre-maturely-optimize-revisited/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 06:40:05 +0000</pubDate>
		<dc:creator>getify</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Performance Optimization]]></category>
		<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[script junkie]]></category>

		<guid isPermaLink="false">http://blog.getify.com/?p=761</guid>
		<description><![CDATA[I wrote an article for Script Junkie (Microsoft) awhile back, and it was just published this week: (pre)Maturely Optimize Your JavaScript. In it, I make several against-the-grain assertions, which not surprisingly have ruffled quite a few feathers. To start out with, I&#8217;m attacking head-on the prevailing &#8220;fear&#8221; around doing anything in your code that even [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote an article for <a href="http://scriptjunkie.com">Script Junkie</a> (Microsoft) awhile back, and it was just published this week: <a href="http://msdn.microsoft.com/en-us/scriptjunkie/gg622887.aspx">(pre)Maturely Optimize Your JavaScript</a>. In it, I make several against-the-grain assertions, which not surprisingly have ruffled quite a few feathers.</p>
<p>To start out with, I&#8217;m attacking head-on the prevailing &#8220;fear&#8221; around doing anything in your code that even remotely <em>looks</em> like optimization, because the magical &#8220;Premature Optimization is the root of all evil&#8221; fairy will come and slap you silly. I argue that in fact &#8220;<strong><em>Immature</em></strong> Optimization is the root of all evil&#8221;, and what we should as devs be most concerned about is learning how to optimize <strong><em>maturely</em></strong>. Mature optimization requires engineer-like thinking, critical reasoning skills, pragmatic wisdom, and above all, the sense of how to properly achieve balance.</p>
<p>Contrary to some of the negative outcry from a few vocal members of the community, I do not think I&#8217;m advocating for anything that would rightly be considered &#8220;premature&#8221;. Some people say that any optimization is &#8220;premature&#8221; if you haven&#8217;t first seen that code fail under load in real conditions. To me, this is like learning to drive by driving first, having accidents, and correcting your driving as you go along.</p>
<p>That doesn&#8217;t mean data and benchmarking is not important &#8211;of course it is.</p>
<h3>Maturity</h3>
<p>But it <em>equally</em> means that the mature and right thing to do is to <em>ALSO</em> be aware of common pitfalls before you hit them, and strive to write code that avoids those bad patterns. Many developers shy away from such approaches because they (I believe wrongly) assume that all &#8220;code optimization&#8221; leads to uglier or harder to maintain code. While this may sometimes be true, to an extent, it&#8217;s not a hard objective fact, and there&#8217;s lots of optimizations that can become <em>default</em> in your coding style that do not significantly detract from the style of your code.</p>
<p>In fact, I believe the 7 comparison snippets (before and after) that I show in the article illustrate that, with a few very minor changes, changes which do not cause the code to become completely unmaintainable, you can improve the performance of your code, sometimes by just a tiny amount, or sometimes by a huge amount.</p>
<p><strong>In NO case would I advocate <em><u>significantly</u></em> uglier or harder to maintain code for only a 3-10% increase in speed, in code that is not very critical.</strong> But a balanced, performance-savvy perspective on every line of code you write will tend to help you avoid the pitfalls, and let you focus your valuable energies on much more important tasks.</p>
<p>Also, notice: I am <strong>not</strong> focusing on esoteric and minute micro-performance details (as some do, and many assumed I was). I made no mention of things like <code>array.push()</code> vs. <code>array[array.length]</code>, or <code>str += "..."</code> vs. <code>str = arr.join()</code>, or <code>++i</code> vs. <code>i++</code>, or <code>for (i=0; i &lt; arr.length; i++) {...}</code> vs. <code>for (i=0, len=arr.length; i&lt;len; i++) {...}</code>, etc etc etc.</p>
<p>The optimizations I focused on were more broad pattern and algorithmic in nature, rather than comparing native operators and functions to each other &#8212; a fruitless task these days, with the browser speed wars still raging strong.</p>
<h3>Profiling, &#8220;profiling&#8221;</h3>
<p>If you already have existing code that&#8217;s a potential candidate for some optimization TLC, as Paul Irish pointed out to me, the prudent thing to do is of course to employ a script profiler. There are several tools that can profile run-time JavaScript performance &#8212; for instance, IE9&#8242;s Developer Tools includes a built-in profiler. Just yesterday, I used it to help my co-workers track down a critical performance bug in one of our client projects.</p>
<p>I wish I had mentioned the process of profiling specifically in the Script Junkie article. It wasn&#8217;t really the point of the article &#8212; the point was, once you&#8217;ve identified a block of code to attack, here&#8217;s some easy ways to do it &#8212; but it would have been helpful to talk about briefly. Paul was right to point out that omission.</p>
<p>On the other hand, the other implied goal of my article was to say that sometimes, our own brains and critical thinking skills can be effective at &#8220;profiling&#8221; the code we&#8217;re writing, as we write it. I regularly find myself halfway through an implementation, and I realize that something I&#8217;ve done is likely to lead to some performance hit later. I find that it&#8217;s usually not too difficult to adjust my approach right as I&#8217;m coding.</p>
<p><strong>How do I so easily recognize a performance anti-pattern as I&#8217;m writing it?</strong> I understand and code with a performance-savvy mentality as my default. I also have plenty of previous experience to inform avoiding repeating mistakes. And most of all, I subscribe to the belief that I should always be trying to learn from others who are knowledgeable and passionate about performance, to gain from their experience and wisdom as well.</p>
<p>This is the same mentality I&#8217;m advocating for other developers to adopt. I call <em>that</em> &#8220;Mature Optimization&#8221;.</p>
<h3>Refactoring</h3>
<p>I&#8217;ve made the claim a number of times, and I stand by it: <strong>The TCO (Total Cost of Ownership) for non-performant code is higher than for performant code.</strong> In other words, pay a little bit now, by way of taking the effort to write performant code (where practical), the first time, or pay more later, if/when you have to refactor that code to address performance concerns.</p>
<p>First of all, in the corporate world (of which i&#8217;ve held many such jobs), you often don&#8217;t ever get the second chance to come back and fix that poorly written code that you have to just throw together to get it out on deadline. We tell ourselves we&#8217;ll always come back and fix the code later, but in reality, we often never get to. And if we <em>do</em> get to come back to it, it&#8217;s usually under extreme pressure because something fell apart in production under real load, and now your boss is pissed at you.</p>
<p><strong>Not only does performance refactoring cost, in general, &#8220;double&#8221; the time in coding (coding it once, then coding it again), but it also costs a lot more time when unfortunate but often inevitable regressions are introduced and must also be corrected.</strong> Performance &#8220;refactoring&#8221; is more risky.</p>
<p>And why do I make that assertion?</p>
<p>Because unit-testing is <em>not</em> the de facto standard in the corporate world like it should be. In fact, of the dozen or so jobs I&#8217;ve held in the corporate web-application industry in my career, maybe 1 or 2 of them actually took unit-tests seriously. Most of the time, full integration tests (and not even comprehensive) were the best tests we could ever get our boss to approve us time to write.</p>
<p>I know we&#8217;d all like to sit on happy island where unit-testing (or TDD) is a reality and we all have 100% test coverage. But it isn&#8217;t that way in the real world, by and large.</p>
<p><strong>When you don&#8217;t have proper unit-tests, performance &#8220;refactoring&#8221; is very risky.</strong> The conditions under which you are forced to do it conspire against you, and you&#8217;re just bound to mess something up, usually in a subtle way you don&#8217;t realize just then.</p>
<p>And, btw, <em>even if</em> you had a full regression test-suite that ran, and immediately notified you that your &#8220;refactor&#8221; just broke something, you still have to spend more time (often costly time) tracking down the cause of that regression, and the regression <em>that fix</em> causes, and so on&#8230; down the rabbit hole we go. </p>
<p>It&#8217;s crazy to think that internally refactoring a function&#8217;s implementation details would cause other side effects, right? Sure, because in the corporate world, we get plenty of time to write 100% high quality code with no shortcuts or assumptions. We never stoop to using a quick global variable instead of figuring out the proper closure-scoping approach, when our boss is breathing heavy down our neck. Our code is always perfectly self-contained and well-patterned, right? </p>
<p>Except, no it isn&#8217;t, because that&#8217;s why we have this performance bug cropping up in the first place &#8212; because we had to cut corners to get code out the door.</p>
<h3>Benchmarks</h3>
<p>Some vocal critics have accused my code snippets in my article as being &#8220;premature optimization&#8221; because the improvements I suggest are &#8220;micro&#8221; or won&#8217;t have any noticeable impact on performance. I disagree with that assertion. And I created some <a href="http://jsperf.com">jsPerf.com</a> tests to illustrate.</p>
<p>The first versions of my test were a little rough and ill-formed, and Rick Waldron pointed out those flaws. I believe I&#8217;ve adjusted these tests to more accurately depict the point that each snippet comparison in the article was trying to make.</p>
<p><strong>NOTE:</strong> these tests are not going to look like your typical tests where you are comparing two native operators or something low-level like that. In several cases, the algorithms between each test case are intentionally quite different. It <em>should</em> be obvious that one will run much quicker than the other, because it&#8217;s doing a lot less (or a lot different) kind of task. The whole basis for these examples in the article is to show how a common, natural code pattern that&#8217;s perfectly semantic and self-descriptive or &#8220;object-oriented&#8221; or whatever, can suffer from (sometimes subtle) performance degradation. And <em>sometimes</em>, an effective way to improve performance is to think about a slightly different pattern or algorithmic approach. I&#8217;m trying to help you &#8220;see the forest above the trees&#8221;.</p>
<ol>
<li><a href="http://jsperf.com/scriptjunkie-premature-1/3">test case: function call, inner loop</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-2">test case: bulk operations or iterators</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-3">test-case: $(this) or collection-faking</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-4">test-case: scope lookup, cached or not</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-5">test-case: prototype chain lookup, cached or not</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-6">test-case: nested property lookup, cached or not</a></li>
<li><a href="http://jsperf.com/scriptjunkie-premature-7">test-case: dom append or dom-fragment append</a></li>
</ol>
<p>Each of those tests has a table below it for the captured/averaged <a href="http://browserscope.org">Browserscope</a> test-run results, per browser. For instance, consider the test results (so far) for the first test-case (&#8220;function call, inner loop&#8221;):</p>
<p><img src="http://getiblog.2static.it/wp-content/uploads/2011/02/test1-results.png" alt="" title="test1-results" width="600" height="101" class="aligncenter size-full wp-image-762" /></p>
<p>Consider the Chrome 9 results: 14,101 operations/sec vs. 15,607 operations/sec. That&#8217;s a difference of 1,506 operations/sec, which is 10.7% faster. What did I do between those two tests? I simply inlined the <code>isOdd()</code> implementation into the loop, instead of as a function call. Is 10.7% life shattering or huge? No, probably not. But it&#8217;s an example of a very common coding pattern which <em>is</em> slower. With slightly more thought, in the case where it was possible/appropriate to do so, we can speed things up by 10.7% and not cost much by way of development time, nor is the code appreciably harder to maintain.</p>
<p>And, if you find that you are writing code that is a little uglier, or maybe is a little less self-explanatory, and you&#8217;re worried that this code may cause you (or some other dev) a maintenance problem later, there&#8217;s a sure-fire easy solution: <strong>WRITE A FRIGGIN&#8217; COMMENT TO EXPLAIN WHY THE CODE NEEDS TO BE THAT WAY.</strong></p>
<p>In fact, let me just say this: if you follow my advice and code with performance patterns in mind from the start, and you find yourself writing a piece of code that is just horribly uglier and no amount of comments can adequately explain why it is that way&#8230;. </p>
<p><strong>STOP and back up. You have my permission to code NON-performantly in that situation, if you simply can&#8217;t do so in a reasonably clean or comment-explainable way.</strong> Sheesh, was that so hard? Am I really a crazy lune?</p>
<h3>Summary</h3>
<p>Quit listening to all the negative hype about &#8220;premature optimization&#8221;. It&#8217;s mostly hot air with no substance. It&#8217;s a bunch of developers following what other developers say, mostly blindly. And the few developers who actually have a reason for rallying against &#8220;premature optimization&#8221;, they&#8217;re probably mad about it not because the optimization was <em>premature</em>, but because it was <em><strong>immature</strong></em>.</p>
<p>So, let&#8217;s all just try to grow up and mature a little bit with how we approach performance-savvy coding. We&#8217;ll all get a lot more quality code written that way.</p>
<div><a class="addthis_button" href="//addthis.com/bookmark.php?v=250" addthis:url='http://blog.getify.com/pre-maturely-optimize-revisited/' addthis:title='&#8220;(pre)Maturely Optimize&#8230;&#8221; revisited '><img src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125" height="16" alt="Bookmark and Share" style="border:0"/></a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.getify.com/pre-maturely-optimize-revisited/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Native JavaScript: sync and async</title>
		<link>http://blog.getify.com/native-javascript-sync-async/</link>
		<comments>http://blog.getify.com/native-javascript-sync-async/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 20:47:23 +0000</pubDate>
		<dc:creator>getify</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[async]]></category>
		<category><![CDATA[defer]]></category>
		<category><![CDATA[es5]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[native]]></category>
		<category><![CDATA[operator]]></category>
		<category><![CDATA[promise]]></category>

		<guid isPermaLink="false">http://blog.getify.com/?p=701</guid>
		<description><![CDATA[One of the most powerful parts of JavaScript is that it allows you to code synchronous (serially executing) code along side asynchronous code. But what is powerful also leads to more complication for a variety of common tasks. I&#8217;m going to explore and propose an addition to native JavaScript that I believe will help developers [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most powerful parts of JavaScript is that it allows you to code synchronous (serially executing) code along side asynchronous code. But what is powerful also leads to more complication for a variety of common tasks. I&#8217;m going to explore and propose an addition to native JavaScript that I believe will help developers in negotiating async code tasks within a sync code base.</p>
<p><strong>(note: If you already understand the problems with nesting callbacks for handling async code, and want to just get to the main point and skip over all the boring back-story, <a href="#proposal">read my proposal</a>)</strong></p>
<p>What I&#8217;m going to be talking about is a complex and emerging topic with some specific descriptors attached, like &#8220;promises&#8221; and &#8220;defer&#8221;. It&#8217;s been pushed heavily recently by the server-side JavaScript community (CommonJS), but I think the usefulness of these concepts is just as valid in browser JavaScript. In fact, one major motivator for this exploration is the desire to design a pattern for sync/async coding that is useful in both contexts, so that more code can be written once and re-used.</p>
<p>However, I want to leave aside some of the formalities of the promise/defer conversation (mostly because I am by far not an expert on it) &#8212; I know just enough about them to be dangerous but probably not enough to be useful. I also want to try and separate opinions/preferences about syntax (although part of the goal is shorter/cleaner syntax) aside and talk mostly about the structural/functional needs. It&#8217;s impossible to talk only about the theory, so there will be a syntax I suggest, but I hope that the conversation that develops around my proposal isn&#8217;t side tracked by the &#8220;weeds&#8221; of syntax before we&#8217;ve thoroughly talked about the underlying behaviors.</p>
<p>There&#8217;s a rather long <a href="https://gist.github.com/727232">gist about my explorations of the idea</a> if you&#8217;re interested in seeing the back-story to this post. I think especially interesting is the comment thread with some really good feedback from several people, including the venerable <a href="http://twitter.com/BrendanEich">Brendan Eich</a> himself (hint: he invented JavaScript).</p>
<h3>A callback in your callback so you can callback</h3>
<p>Almost universally, the common pattern in JavaScript for dealing with asynchronous coding is the idea of the callback. That is, taking advantage of the first-class status of functions, passing a reference of a function to an async mechanism, asking for the function to be &#8220;called back&#8221; when the asynchronous process finishes. But most anyone who&#8217;s ever written mid-complexity (or more) code with async functionality has run into several common problems with the callback pattern for async handling. At its most basic definition, what I&#8217;m trying to do here is come up with a better way to handle such tasks. Of course, it goes much deeper than that, but that&#8217;s probably the best mental place to start.</p>
<p>Let&#8217;s first take a look at some example code written in the traditional callback style to illustrate some of these issues:</p>
<pre class="code">
function xhr(url, callback) {
   var x = new XMLHttpRequest();
   x.onreadystatechange = function(){
      if (x.readyState == 4) {
         callback(x.responseText);
      }
   };
   x.open("GET",url);
   x.send();
}

setTimeout(function(){
   xhr("/something.ajax?greeting", function(greeting) {
      xhr("/else.ajax?who&#038;greeting="+greeting, function(who) {
         console.log(greeting+" "+who);
      });
   });
}, 1000);
</pre>
<p>OK, so in this code, we want to wait 1 second (why? who cares!) and then we want to fire off an Ajax request to get some content. Next, we use that content to make a second Ajax request. Finally, once we have both pieces of content fetched, now we print them out to the console.</p>
<p>Firstly, the syntax ugliness of nesting callbacks 3 levels deep illustrates the inefficiency of this pattern. We accept that this is status quo, but in reality, I think this is an example of how the language is not properly (natively) supporting the common programming patterns found in modern usage of the language. We&#8217;re having to hack around with ugly, complicated, and thus harder-to-maintain syntax because that&#8217;s basically our only direct option.</p>
<p>Now, let&#8217;s consider what happens if I want to take that functionality and make it into some sort of general, reusable piece of code. There&#8217;s a few different ways this can be done:</p>
<pre class="code">
function doSomethingCool(url_1, url_2, callback, delay) {
   setTimeout(function(){
      xhr(url_1, function(greeting) {
         xhr(url_2+greeting, function(who){ callback(greeting, who); });
      });
   }, delay);
}

doSomethingCool("/something.ajax?greeting",
   "/else.ajax?who&#038;greeting=",
   function(greeting, who) {
      console.log(greeting+" "+who);
   },
   1000
);
</pre>
<p>Look at the brittleness we&#8217;ve now had to introduce. First, because we&#8217;ve abstracted that a URL (the &#8220;/else.ajax&#8221; one) is something that should be passed in as a parameter, we&#8217;ve now separated the URL passed in with the concatenation of the content received from previous Ajax request. We&#8217;ve created brittleness between code hidden away inside that general function and the calling code. If either of those changes, both pieces of code will have to change. In our example, the code is right next to each other, and not too hard. But what if this code is a method in an object abstracted away in a third-party library. This brittleness is a pattern that will lead to hard-to-maintain code and eventually failure.</p>
<p>Next, notice that we had to change the signature of the callback we pass in, so that it accepts both the `greeting` and `who` parameters, because now our calling code is separate from the context scope (closure) that gave us access to `greeting` in the previous code snippet without it being passed explicitly. And, we have had to use yet another anonymous function inside of `doSomethingCool` to compose the parameters `greeting` and `who` into a single call to our callback. This code will work, but it&#8217;s brittle and it fails the &#8220;sniff&#8221; test &#8212; if it seems to be too complicated, it&#8217;s probably wrong. Is there anything better we can do?</p>
<pre class="code">
function doSomethingSortaCool(url, callback, delay) {
   setTimeout(function(){
      xhr(url, callback);
   }, delay);
}

doSomethingSortaCool("/something.ajax?greeting", function(greeting) {
   xhr("/else.ajax?who&#038;greeting="+greeting, function(who) {
      console.log(greeting+" "+who);
   });
});
</pre>
<p>Meh. This is not getting much better. We removed one layer of the nesting abstraction. But now our calling code is having to take care of the complication instead of our utility. `doSomethingSortaCool()` is a little more general now, but we&#8217;ve just shifted the problem and complexity to the usage code. It seems our nesting is just going to be difficult to deal with.</p>
<p>So, I&#8217;ve been talking thus far about difficulties not only with syntax but also with interaction among components (aka, steps in our asynchronous logic). But let&#8217;s also point out another issue that such a pattern has. This is an issue that is clearly more a concern for server-side JavaScript folks, but it&#8217;s inherent to core JavaScript and thus is something I&#8217;d like to see addressed, regardless of how effective it is in the browser context.</p>
<p>The concern is this: what if `doSomethingCool()` is a function we did not write and do not control? What if we are linking to a third-party library that we can&#8217;t fully &#8220;trust&#8221; to always behave as it should? The misbehavior could be intentional/malicious, or it could simply be poorly implemented. But in either case, by calling it, and specifically passing to it a reference to one of our functions, we are <em>trusting</em> that function to execute our function for us at the appropriate time. Misbehavior could be calling our callback too early, or too late, or more than once, or never, or passing not enough parameters, or passing the wrong data, etc.</p>
<p>Not only are we &#8220;trusting&#8221; that function to behave properly, but we also have basically very little way to directly assert and enforce that proper behavior. We can go to great lengths in our callback&#8217;s scope to keep track of when and how it is called, making sure it&#8217;s only ever called once, and that it&#8217;s passed proper data. But that&#8217;s a crazy amount of work to do for <strong>every single callback function we ever define</strong>. Clearly, this code is begging for a more robust system to negotiate these problems for us.</p>
<p>Let&#8217;s step back a little more general now:</p>
<pre class="code">
X(..., Y);
</pre>
<p>`X` is possibly an asynchronous function, meaning it may not finish right away (but it might!). But we don&#8217;t want `Y` to finish until after `X` is done. Moreover, we want `Y` to be notified of the eventual result of `X` when it is complete.</p>
<p>We pass `Y` to `X` and expect that `X` will execute it for us. Not only that, but we expect that X will make sure to pass the right parameters to `Y`. What if we actually need to specify some of the parameters for `Y`? We can pass all those parameters to `X` and ask it to add them to the call correctly. Or we can &#8220;curry&#8221; `Y` into a partially applied function, and pass that intermediate function to `X` instead. Currying is a standard pattern, but it&#8217;s a little advanced, and it involves some complexities that may be prohibitive. It also uses implicit anonymous function wrapper(s), which hampers our code&#8217;s efficiency the more layers of wrapping we have to involve.</p>
<pre class="code">
X(..., function(){ Y(..., Z); });
</pre>
<p>Here, we&#8217;re doing the same thing, but now we&#8217;re passing in a hard-coded function that will call `Y` for us. However, we&#8217;re passing <em>that</em> function to `X`, so we&#8217;re still trusting `X` to behave properly. Moreover, we&#8217;ve created brittleness because this callback must call `Y`, which must pass in `Z`. We saw above the complexities if we want to generalize those connections. Lastly, we&#8217;re <em>also</em> now trusting that `Y` will properly behave with respect to `Z`. If `X` and `Y` are third-party, and only our `Z` we control, we&#8217;ve doubled the ways that our code may assume proper behavior and fail because we&#8217;re not certain that it will indeed do what we want.</p>
<p>One final wrinkle: we often need to define both what happens if the asynchronous code `X` finishes successfully, and what happens if it fails in some way. So, with the callback paradigm, now we&#8217;re passing twice as many functions to every layer of abstraction, and again, we&#8217;re relying even more so on the potentially untrusted code to make sure one or the other (but not both!) of our callbacks are called under the appropriate circumstances.</p>
<p><strong>The problem is that we can&#8217;t &#8220;trust&#8221; (in the sense we&#8217;ve been discussing) code we don&#8217;t control</strong>, so <em>we</em> (that is, our direct code) need a way to be in control all of the execution, not the third-party code. Also, messaging between the async steps can get really complex quickly when nesting is involved. Lastly, parameter passing to subsequent steps must either be hard-coded or we must jump through lots of hoops to get them passed properly.</p>
<pre class="code">
X(...);
Y(...);
Z(...);
</pre>
<p>It would be ideal if our code could be written like this, and `X`, `Y`, and `Z` could all be asynchronous and the order would be preserved. As it stands, this is not possible. If any of those functions defers its completion by calling an async call, processing will continue, and thus order of completion will not be maintained.</p>
<p>&#8220;Continuations&#8221; are a programming concept where the code execution can suspend itself after `X(&#8230;)` is called, and the rest of the program can be &#8220;continued&#8221; later when `X` finishes, etc. One such syntax might look like:</p>
<pre class="code">
X(...); yield;
Y(...); yield;
Z(...); yield;
</pre>
<p>I&#8217;m not suggesting this is how we should deal with our issue. But it at least is informative to look at the other ways these problems have been tackled in the past.</p>
<p>Put simply, we want/need a linear sequence of function calls. We need those functions to execute in order, even if one or all of them are asynchronous in nature. We need those functions to be able to cascade/propagate their results down the sequence, so that step 2 knows what the result of step 1 was (in case it needs it), etc. (btw, the &#8220;message passing&#8221; should not rely on shared state side-effects.) And we need a way to express what happens if any step in our sequence of function calls fails to fulfill its success path.</p>
<h3>APIs</h3>
<p>I hope I&#8217;ve now illustrated the issues that first-order function reference callbacks fall short of adequately addressing those needs. I&#8217;m not by any means the first person to articulate such issues or try to find solutions to these shortcomings. There are a number of proposals in the CommonJS community for handling &#8220;promises&#8221;. Dojo has a &#8220;deferred&#8221; utility. And there are many others.</p>
<p>I&#8217;ve even experimented with a chainable API syntax shim for handling them. For instance:</p>
<pre class="code">
function delay(time) {
   var p = new Promise();
   setTimeout(function(){ p.fulfill(); }, time);
   return p.deferred;
}
function xhr(url) {
   var p = new Promise(),
         x = new XMLHttpRequest()
   ;
   x.onreadystatechange = function(){
      if (x.readyState == 4) {
         p.fulfill(x.responseText);
      }
   };
   x.open("GET",url);
   x.send();
   return p.deferred;
}

delay(1000)
.then(function(){
   return xhr("/something.ajax?greeting");
})
.then(function(P){
   return xhr("/else.ajax?who&#038;greeting="+P.value)
   .then(function(P2){ return [P.value, P2.value]; });
})
.then(function(P){
   console.log(P.value[0]+" "+P.value[1]);
});
</pre>
<p>So, notice how this (really naive and simple) experiment into promises that I did is trying to address some of the concerns. First and foremost, the `Promise` system (sort of a neutral trustable party) acts as a &#8220;proxy&#8221; of sorts between the various steps of this chain. I pass functions to the `Promise` mechanism by calling the `.then(&#8230;)` chained method of the object (aka &#8220;deferred&#8221;) that each of my functions returns. I&#8217;m not passing my functions to some untrusted third-party system, I&#8217;m passing them to a trusted neutral party. The `Promise` system only has one way that a function can signal that it&#8217;s finished: `fulfill(&#8230;)`. </p>
<p>This is guaranteed (by the `Promise` mechanism) to be callable once and only once. And `fulfill(&#8230;)` can take a &#8220;value&#8221; (aka, &#8220;message&#8221;) that it will pass along to the next step in the chain. Lastly, this syntax (while a little clunky itself) does address at least some of the awkwardness of nesting callbacks, parameter hard-coding, etc.</p>
<p>The point of me bringing this up is not to suggest that it&#8217;s the right way to deal with these problems. Most people seem to favor an alternate (not really chainable) syntax often called `when` syntax. It might look like this:</p>
<pre class="code">
when(X, Y, Z);
</pre>
<p>That means, execute `X`. If successful, do `Y`, otherwise do `Z`. While chainability of this particular syntax is not straightforward (and is likely achieved with&#8230; nesting when&#8217;s), one great thing it does is address the concern that X shouldn&#8217;t have to execute Y or Z (or even <strong>know</strong> about them, for that matter). This `when(&#8230;)` is similar to my `Promise` experiment &#8212; a neutral party mechanism to negotiate promise deferral.<br />
<a name="proposal"></a></p>
<h3>A modest (native) proposal</h3>
<p>While there are many proposals for handling promises, and some of them are even eventually hoping to be a candidate for inclusion in the language, I&#8217;d say they are all mostly API focused. I believe my new proposal (while it has some API to it) is more focused at a lower level than API, and I hope is easier to integrate natively into the language.</p>
<p>Revisiting the generic example from above:</p>
<pre class="code">
X(...) ,
Y(...) ,
Z(...);
</pre>
<p>Notice, I&#8217;ve used a &#8220;,&#8221; between `X` and `Y` and between `Y` and `Z`. This should not be interpreted to mean I&#8217;m suggesting that syntax. I just want to illustrate the general concept that I want to be able to easily specify a &#8220;chain&#8221; of 3 expressions (in this case, async function calls), and I want for those 3 to evaluate in order, even if the processing needs to suspend because of async deferral. I could just as easily list the example like this, and mean basically the same thing:</p>
<pre class="code">
????  X(...)  ????  Y(...)  ????  Z(...)  ????
</pre>
<h4>Execution/Processing Model</h4>
<p>Further, let me state, in contrast to the formal &#8220;yield/continuation&#8221; model discussed above, I don&#8217;t think that `X` deferring itself means that the entire rest of the program needs to be yielded. In this case, I&#8217;ve arranged these 3 function calls into a single statement. <strong>So, the &#8220;yield/continuation&#8221; behavior would be localized only to the single statement</strong>. The statement will always be executed part-by-part, from left-to-right.</p>
<p>If the statement at any point suspends itself (defers completion), program execution would then continue immediately at the next statement. When the &#8220;suspended&#8221; statement is ready to resume, it will wait for the next &#8220;turn&#8221; available (in the exact same way in JS as any other async code waiting to execute), and will then continue on with execution of the next part of the statement. A statement can be &#8220;suspended&#8221; and &#8220;resumed&#8221; back and forth as many times as necessary.</p>
<p>The &#8220;suspension&#8221; is always controlled by the function calls in the statement (ie, yielding, not pre-emptive interruption).</p>
<p>The statement doesn&#8217;t get any special behavior with respect to any references to surrounding scoped variables. Each part of the statement will be executed at immediate-point-in-time taking into account whatever the current scope state is. There&#8217;s no suggestion that any type of state insulation where parts of the statement are protected from effects from other parts of the statement or from the rest of the program. In this respect, the async completing code behaves in pretty much the same way as a callback function that is executed later &#8212; it knows about references to variables, but it will use their current state when it executes that part of the statement.</p>
<p>Each part of the statement that is a function call which suspends (aka &#8220;defers&#8221;) its own completion, when it resumes/finishes, it will then be able to signal to the next part of the statement a &#8220;message&#8221; as the final result of the function call. Each part of the statement that is a function call which immediately follows an async part will be able to receive the &#8220;message&#8221; from the previous part, separate from and in addition to any explicit parameters to the function call itself. This &#8220;message passing&#8221; mechanism must not rely on shared state variable side-effects, but must instead be intrinsic to the statement handling mechanism.</p>
<p>The statement of this nature has 2 or more parts to it, but no part actually has to be a function call, or even if a part is a function call, that function call doesn&#8217;t have to suspend itself but can instead finish completely synchronously. The parts of a statement that are not function calls can be any kind of expression, but not other statements (like control structures, return, var, etc). These non-function-call parts (general expressions) cannot suspend themselves, and must therefore always be evaluated/executed synchronously immediately. Only a function call (even an executing function expression) may suspend itself for the purposes of async.</p>
<pre class="code">
???? X()  ????  (Y = 10)  ????  Z() ????
            // `Y = 10` would be an immediately fulfilled promise expression
            // that moved execution onto the next expression in the statement.
</pre>
<p>As is true for the rest of JavaScript (which does not yet have { } block level scope), the only way to contain multiple statements inside a single part of the main  statement is to wrap them in an inline function and execute that function. There are no special semantics or rules for each individual part of the main statement, except that each of them must be a valid expression.</p>
<p>Each time a function is called that may suspend itself, there must be a way to specify in the statement both a success path (if the function finishes as expected) and a failure path (if the function completes but not with expected results).</p>
<h4>Syntax</h4>
<p>Let me get specific about my idea for syntax based on the context described above. But I won&#8217;t spend too much time on it, because the syntax is less important than the execution/processing model.</p>
<p>I propose that the parts of the statement in question be separated by a new infix operator: @. The @ operator is like the , operator in that it evaluates multiple expressions one at a time, but it has the special behavior that it understands when a function suspends/defers itself, and suspends evaluation of the rest of the statement until that part completes.</p>
<p>As stated above, each operand of the @ operator can be a function call, but may just be a general expression. If an operand is a function call, and it suspends/defers its completion, the @ operator will wait to continue. Otherwise, immediate resolution will proceed through the statement, expression by expression, left to right.</p>
<p>The @ operator also has a syntax similar to the ?: ternary operator which expresses the success/failure conditional paths. Only one of the two expression clauses will be executed, depending on how the function chooses to resolve itself (fulfillment or failure).</p>
<p>Lastly, the message passing described above is implicitly handled by the @ operator mechanism, exposed only to function calls in the statement.</p>
<p>Let&#8217;s revisit an earlier example and see how the @ operator would be used:</p>
<pre class="code">
function delay(time) {
   var p = promise;  // `promise` is now a special auto variable
                     // like `arguments` or `this`
   setTimeout(function(){ p.fulfill(); }, time);
   p.defer(); // used to flag this function as deferring/suspending
              // its completion to be async
}
function xhr(url) {
   var p = promise,
         args = p.messages || [],
         x = new XMLHttpRequest()
   ;
   x.onreadystatechange = function(){
      if (x.readyState == 4) {
         args.push(x.responseText);
         p.fulfill.apply(null,args);
      }
   };
   if (p.messages) url += p.messages[0];
   x.open("GET",url);
   x.send();

   p.defer();
}

delay(1000) @
xhr("/something.ajax?greeting") @
xhr("/else.ajax?who&#038;greeting=") @
(function() {
   console.log(promise.messages[0]+" "+promise.messages[1]);
})();
</pre>
<p>As you can see, I chain 4 function calls together with @ in between, the first 3 of which are async and deferred. As discussed in the processing model above, and also in the required/desired behaviors for an async/promise/defer system, this syntax accomplishes all those basic goals.</p>
<p>The simple &#8220;API&#8221; portion of my proposal is how you interact with the &#8220;promise&#8221; mechanism from inside of any function. There is a native auto variable `promise` (similar to `arguments` or `this`) which represents that function call&#8217;s promise. The `promise` object has a `fulfill(&#8230;)` method and a `fail(&#8230;)` method, for (not surprisingly) fulfilling the promise or failing it. Lastly, you opt into deferring a function&#8217;s completion by calling `defer()` on its promise object.</p>
<p>Both the API and the operator are my basic first pass proposal, and are open to input/adjustment.</p>
<h3>Summary</h3>
<p>I&#8217;m not trying to solve all of the issues that a language like JavaScript has with dealing with complex async/chained logic through promises/defers. I&#8217;m not suggesting that noone will ever have to use a promise/defer &#8220;lib&#8221; of some sort for the more complex negotiation tasks. I&#8217;m not suggesting that @ will be the cure all.</p>
<p>What I am suggesting is that the best first step to take in helping deal with sync/async negotiation in JavaScript is for there to be a small, narrowly defined, natively implemented mechanism (in my idea, as a special operator), which will serve as a building block for more complex usage, and will simultaneously provide very streamlined and simple syntax for a wide array of common sync/async tasks (like event binding, as seen in the <a href="https://gist.github.com/727232#file_ex4:click_event_handlers_as_promises">original gist exploration</a>).</p>
<p>I genuinely want to avoid some of the syntactic disagreements that have held back promise/defer API implementations by stripping down a significant part of it to a simple chainable infix operator. That doesn&#8217;t mean there&#8217;s no API, nor does it mean the function API I&#8217;ve proposed is correct. It just means that we need something efficient and native in JavaScript sooner, rather than later, to being dealing with sync vs. async issues. I think my proposal has some merit in the respect of being a productive step forward (and one of many necessary steps we must take as a language).</p>
<div><a class="addthis_button" href="//addthis.com/bookmark.php?v=250" addthis:url='http://blog.getify.com/native-javascript-sync-async/' addthis:title='Native JavaScript: sync and async '><img src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125" height="16" alt="Bookmark and Share" style="border:0"/></a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.getify.com/native-javascript-sync-async/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>You&#8217;ve got your orange thing, I&#8217;ve got my square</title>
		<link>http://blog.getify.com/orange-vs-square/</link>
		<comments>http://blog.getify.com/orange-vs-square/#comments</comments>
		<pubDate>Thu, 11 Nov 2010 22:21:37 +0000</pubDate>
		<dc:creator>getify</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Performance Optimization]]></category>
		<category><![CDATA[UI Architecture]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[middle-end]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[server-side]]></category>
		<category><![CDATA[templating]]></category>
		<category><![CDATA[ui architecture]]></category>

		<guid isPermaLink="false">http://blog.getify.com/?p=666</guid>
		<description><![CDATA[If you haven&#8217;t noticed yet, almost every one of my blog posts starts out as a twitter conversation. There must be a pattern there. So, today, I&#8217;m going to address the bigger issues around the question I first asked, which started off the conversation, seems inefficient: emulating a DOM (built up w/ server-side JS)&#8230;which then [...]]]></description>
			<content:encoded><![CDATA[<p>If you haven&#8217;t noticed yet, almost every one of my blog posts starts out as a <a href="http://twitter.com/#!/getify/status/2766997037252608">twitter</a> <a href="http://twitter.com/#!/getify/status/2768079054446592">conversation</a>. There must be a pattern there. <img src='http://getiblog.2static.it/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So, today, I&#8217;m going to address the bigger issues around the question I <a href="http://twitter.com/#!/getify/status/2760342438412289">first asked</a>, which started off the conversation, </p>
<blockquote><p>
seems inefficient: emulating a DOM (built up w/ server-side JS)&#8230;which then has to serialize into a html string, transfer, then be rebuilt.
</p></blockquote>
<p>though it turns out I had to <a href="http://twitter.com/#!/getify/status/2772140415778816">rephrase it</a>, because I got <a href="http://twitter.com/#!/getify/status/2770918904434689">answers to a different question</a> than what I asked.</p>
<blockquote><p>
rephrase: i get why server-side js is awesome (code reuse, prog enhnc, etc). why a DOM approach instead of shared/reusable templating?
</p></blockquote>
<h3>Tale of Two Questions</h3>
<p>For the sake of sanity and clarity, let&#8217;s break down the conversation into the two fundamental questions, and address them separately:</p>
<ol>
<li>Why is server-side JavaScript helpful for front-end/UI tasks?</li>
<li>Why would you use the DOM approach to the UI in server-side JavaScript as opposed to a templating approach?</li>
</ol>
<h3>Server-side UI</h3>
<p>There&#8217;s some really exciting things you can do with the UI of your web application with JavaScript. This is probably not news to anyone who hasn&#8217;t had their head under a rock for the last 5-8 years.</p>
<p>But what role (if any) can server-side JavaScript have in the question of how to use JavaScript in the front-end of your application? Certainly that seems like an oxymoron to be doing JavaScript in your &#8220;back-end&#8221; primarily for the benefit and use of your &#8220;front-end&#8221;.</p>
<p>In fact, this is <em>exactly</em> why I&#8217;ve been banging the loudest drums I can find to try to define the <a href="http://middleend.com">&#8220;middle-end&#8221;</a> (that is, what sits in between the front-end and back-end&#8230; the middle-end!) of the web application stack. </p>
<p>Specifically, there are a number of UI-related tasks which frequently occur in both places &#8212; the browser and the server. For instance: presentation &#8220;construction&#8221;, data/form validation (or data binding), etc. These tasks, along with others, I have defined as the &#8220;middle-end&#8221;.</p>
<p>Moreover, concepts such as &#8220;progressive enhancement&#8221; are heavily related &#8212; that is, the idea of progressively enhancing the presentation layer capabilities depending on the capabilities of the client making the request.</p>
<p>Where server-side JavaScript offers some really exciting answers is:</p>
<ol>
<li><strong>Code Reuse</strong>: If I can write my data validation code, or my UI construction code, etc, <em>once</em>, <strong>in JavaScript</strong>, and then reuse that code in both the browser and the server, I&#8217;ve significantly improved the maintainability of my application by achieving <a href="http://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY</a> (don&#8217;t repeat yourself).</li>
<li><strong>Progressive Enhancement</strong>: If I can take advantage of JavaScript&#8217;s ability to build out my page&#8217;s UI dynamically, I can achieve some really awesome things in the browser. But what about browsers which don&#8217;t have JavaScript (enabled)? The typical approach has been to serve a more basic version of the content to such less-capable browsers.
<p>But what if I could get most of the awesome benefits of JavaScript-powered presentation and not be affected by the browser&#8217;s inability to do JavaScript? What if I could run all my awesome JavaScript to build up my page (or even to dynamically alter it) on the server, and just send the HTML result of such operations over to a less-capable browser? To the user of such a browser, they might see things updating slower and a little less gracefully (with full page-refreshes), but they&#8217;d get most of the benefits of a dynamically capable page experience without having any JavaScript at all. </p>
<p>This could be a <strong>huge</strong> win for progressive-enhancement advocates, because it significantly reduces the amount of work that a site must do to provide some sort of alternate content experience for JavaScript-less browser users.</li>
</ol>
<p>These are not the only interesting ways that server-side JavaScript could be used to really revolutionize the web application process, but they should be more than enough to answer the question: &#8220;Why is server-side JavaScript helpful for front-end/UI tasks?&#8221;</p>
<h3>To DOM or not to DOM</h3>
<p>The meatier question in play here is, if we agree that server-side JavaScript should be used to improve the web application stack, should we choose a DOM-based approach to building out the UI on the server, or should we use a string-templated approach?</p>
<p>I&#8217;ll cut to the chase. <strong>The short answer is&#8230; BOTH!</strong></p>
<p>But the nuances of that assertion are what we&#8217;ll now focus on for the rest of this post.</p>
<h3>Oranges &#038; Squares</h3>
<p>Let&#8217;s first define more specifically what we mean by each distinct approach:</p>
<ol>
<li><strong>DOM approach</strong>: By this, I mean that you build up your page by explicitly creating each element as a node in a DOM tree. For instance, if I want to add this &lt;ol>&lt;/ol> ordered-list to my page, I first create an &#8220;ol&#8221; node, then I create an &#8220;li&#8221; node, then I append some text to the &#8220;li&#8221; node (as a &#8220;text&#8221; node of course), then I add the &#8220;li&#8221; node to the &#8220;ol&#8221; node, etc. Once my &#8220;ol&#8221; node is ready, I then append my &#8220;ol&#8221; node as a child element of some part of my page, like the current section I&#8217;m in.
<p>Such an approach might look like this (using jQuery):</p>
<pre class="code">
var $ol = $("&lt;ol>&lt;/ol>");
var $li = $("&lt;li>&lt;/li>");
$li.text("Some text");
$ol.append($li);
$("body").append($ol);
</pre>
<p>With this code, our JavaScript executes and programmatically builds up our DOM elements, eventually adding our &#8220;ol&#8221; element to the end of the &#8220;body&#8221; element.
</li>
<li><strong>Template approach</strong>: By contrast, here I mean that to build up the &#8220;ol&#8221; list as described above, you instead create a string like &#8220;&lt;ol>&#8221;, followed by a &#8220;&lt;li>&#8221;, then some text, then &#8220;&lt;/li>&#8221;, etc. Once the entire string of markup is done, then you insert that entire string into the page (or part of the page), either as the full HTML or as the <em>.innerHTML</em> of some element.
<p>Of course, I&#8217;m not suggesting that you literally have to manually add strings together with variables and such. That&#8217;s why we have template engines that do that for us. So, instead, you have a template like this:</p>
<pre class="code">
&lt;ol>
   &lt;li>{some_text}&lt;/li>
&lt;/ol>
</pre>
<p>With this &#8220;code&#8221;, our templating engine grabs the <code>{some_text}</code> part and replaces it with the value of that variable, and returns us the full constructed string, so we don&#8217;t have to mess with all the string operation details.</li>
</ol>
<h3>Zoom In</h3>
<p>So, let&#8217;s examine the differences between these approaches.</p>
<p>The first most obvious difference is what the &#8220;code&#8221; looks like that generates something visible in our presentation layer. In the DOM approach, large chunks (if not all) of your presentation layer is written in JavaScript <em>code</em>, usually tucked in with all the other front-end JavaScript. In the templating approach, almost all of your presentation layer is in HTML markup (as a string), usually in separate template files.</p>
<p>The result of building your UI in the DOM approach is an actual DOM tree (a physical nested object heirarchy in memory). The result of building your UI in the template approach is just a string of HTML markup.</p>
<p>Let&#8217;s be clear about something: <strong>a DOM is required to render something the user can see</strong> in the browser. So, the string of HTML markup that you get from the templating approach must of course eventually get parsed into a DOM tree.</p>
<p>That presents an obvious question: if a DOM is eventually required to actually <em>show</em> the presentation layer, isn&#8217;t it more efficient to just start out with (and stay in) the DOM tree, rather than going with string manipulation as a first stage of the process?</p>
<p>This is a fine and fair question. I hope I can quickly distill a few thoughts to suggest why I&#8217;m not so sure that the <em>obvious</em> is entirely correct &#8212; there&#8217;s more here than meets the eye.</p>
<h3>Designers</h3>
<p>Firstly, I&#8217;d suggest that there&#8217;s a pretty strong precedent for building HTML pages in&#8230; HTML. Designers, IDEs, and most web developers have learned and in many cases mastered the process of writing HTML markup. To my knowledge, almost none of the web development IDEs and typical processes are centered around an &#8220;(DOM)object-oriented&#8221; building of pages &#8212; in other words, I&#8217;ve never personally seen an IDE where when I look at source-view of a page, I&#8217;m messing around in a nested tree structure and adding and removing nodes.</p>
<p>Moreover, the very concept of a string of HTML markup actually conceptually representing a hierarchy of parent and child nodes is a very programmer-centric view. It&#8217;s like the difference between saying &#8220;The Quick Brown Fox&#8221; is a sentence phrase versus saying that, as a programmer, that&#8217;s actually an array of individual characters. Both are true, but one view is more semantic and the other is more programmatic.</p>
<p>So, stylistically, you can make a strong case (although it is ultimately preference-driven) that building out HTML pages is more natural and well-established as HTML markup as opposed to JavaScript code. From purely a role-separation standpoint, it&#8217;s perhaps more natural for UI designers (and the markup/CSS role) to be able to work in simple HTML markup than they will in code (whether that be JavaScript code, or some other back-end language code like PHP, Java, .NET, etc).</p>
<p>Just because we layer on a simple templating engine on top of the HTML markup to represent the concept of dropping in a dynamic data value in a certain location, doesn&#8217;t mean we&#8217;ve changed the paradigm from working with HTML markup to working with DOM nodes. Of course, some &#8220;templating engines&#8221; are far more complex, including all kinds of &#8220;programming-like&#8221; stuff, like conditionals, loops, function calls, arithmetic, boolean logic, etc. The more of that junk you cram into the &#8220;templating language&#8221; of your choice, the more like programming your presentation layer construction becomes.</p>
<p>As should probably be obvious, my strong preference is at the far simple end of the templating spectrum, where a bare minimum of anything that looks like &#8220;programming&#8221; is going on inside my HTML markup. I think that&#8217;s backed up by lots of precedence in &#8220;separation of concerns&#8221;, and I believe a lot of people agree. But there are others who prefer to write <em>code</em> to build up their presentation layer. To each his own, on that, I guess.</p>
<h3>Down To The Wire</h3>
<p>The other major difference, when we factor in such tasks as presentation layer building occurring on the server, is how the stuff that gets sent over the wire is prepared for the trip from the server to the browser.</p>
<p>In either the DOM or template approach, if the browser doesn&#8217;t have JavaScript enabled, the entire process of building the presentation layer occurs on the server, and what gets sent over the wire is always and only a string of HTML markup. The exact same page built with the DOM or template approach will end up being exactly identical when it squeezes down to fit through the pipe from server to browser &#8212; a string. </p>
<p>If we accept that in that base case, what transfers over the wire must be a string, then the question above of what&#8217;s more efficient (DOM or string) gets flipped. If the in-memory representation of the DOM (on the server) must first be down-converted to a string to be sent over the wire, <strong>this will tend to be less efficient than if the presentation layer is already (and only) just a string, ready to send.</strong></p>
<p>Now, what about the case where JavaScript <em>is</em> available on the browser? Assuming then that we&#8217;re in the part of a task where we want to off-load our presentation construction to the server, what gets sent over the wire is still going to be pretty much equivalent (though not the same). For the DOM approach, we&#8217;re going to send JavaScript, probably in the form of .js files, and in the template approach, we&#8217;re going to send HTML templates (strings), probably in the form of template text files. In each case, the files will be normatively cacheable to about the same extent.</p>
<h3>Run Time</h3>
<p>Both approaches are going to need JavaScript though, not just the DOM approach. The template approach is going to need a &#8220;template engine&#8221; that can do the same tasks that it did in the on-server scenario &#8212; do string operations and combine the string content with dynamic data. The DOM approach is going to need JavaScript logic to create, insert, append, and modify the in-memory DOM tree.</p>
<p>I don&#8217;t have conclusive statistics to suggest that either of those two JavaScript code logic paths are going to be substantially faster or more efficient than the other. String operations have, especially in recent JavaScript engines, become exceedingly optimized and fast (compared to really bad string performance in previous generations). Moreover, an intelligent template engine is going to &#8220;cache&#8221; the compilation of a template (that is, all the string parsing) into repeatedly-executable code, so the costly regular-expression-like parsing of the template string approach is going to be drastically reduced, considering the whole life time of the page view.</p>
<p>But let&#8217;s examine what happens at the moment when the presentation layer is about to cross the bridge and actually be added to the live DOM of the page. In other words, let&#8217;s compare whether it&#8217;s more efficient to inject a single ready string of markup via .innerHTML or to append a sub-tree of nodes to the main DOM tree. Fortunately, a <a href="http://jsperf.com">jsPerf</a> test exists to help answer this question: <a href="http://jsperf.com/dom-vs-innerhtml/10">DOM vs. innerHTML</a>.</p>
<p>I ran this test a few times in a few different browsers, and got conflicting results. In Webkit (Chrome/Safari), I got that the DOM insert was quicker (30-50% so) than the .innerHTML insert. But in FF (and to some extent, IE), I got the opposite. Opera was 40% faster at the .innerHTML than at the DOM insert. So, those results are mixed to say the least. I do however think it&#8217;s fair to say that as we would go back in time to older generations of browsers, the .innerHTML insert would tend to be a little faster, since most of the attention and focus in recent optimizations of the DOM has been in the DOM append operations. This is a conjecture at best on my part, but I think it&#8217;s not terribly inaccurate.</p>
<h3>Ideal vs. Realistic</h3>
<p>And let&#8217;s look closer at what&#8217;s going on in this test. The DOM insertion test case uses a <a href="https://developer.mozilla.org/En/DOM/DocumentFragment">DocumentFragment</a>, basically a temporary non-rendered DOM sub-tree, to append the children nodes to. Why go to this effort? Because it&#8217;s common knowledge that the absolute slowest part of anything you can do in JavaScript is the actual DOM manipulation against the live DOM of the page.</p>
<p>The reasons for this are varied, but include even things like how DOM manipulations require updating of internal live nodeList collections, force the rendering engine to do reflow and repaint, etc. So, if we can do all our DOM manipulation in a disconnected sub-tree without such side effects, we&#8217;ll drastically reduce the cost of DOM insertion. This is a well known and cited optimization technique that has very tangible effects. If that same jsPerf test were run without the use of DocumentFragment and were just inserting the children directly into the live DOM one at a time, <strong>the DOM insertion test case would have been hundreds or thousands of times slower across all browsers.</strong></p>
<p>So, are the DOM approach and template approach tied or inconclusive in terms of run-time performance? Perhaps. But I&#8217;d say maybe not. The test above represents the absolute best-case optimization, where you have everything you need for the DOM ready in a fragment, and can do it in one insert. In the real world complexity, you may have cases where you need to do 2, 3 or even 10 inserts all around a page. Because the overhead of the live DOM insert is, in general, much higher, the more of them you do, the quicker that negative performance will add up, and will be slower than the .innerHTML inserts.</p>
<h3>DOM > string</h3>
<p>There are a few cases where some valid arguments can be made that modifying the DOM dynamically is easier or better than dealing with HTML strings. </p>
<p>For one, whenever we&#8217;re dealing with event handlers (which get bound directly to DOM nodes), if you start re-assigning the innerHTML of nodes (or the entire page), you&#8217;re likely to screw up your event handlers. This is a common and costly mistake. It can of course be mitigated by the established best practices of using event bubbling/delegation, where you have event handlers bound only to top level DOM elements (like the body, for instance), which listen for bubbled-up events from any child elements &#8212; when you do it this way, you can freely re-create your elements using either DOM manipulation or innerHTML, and not mess up the event handling.</p>
<p>However, let&#8217;s keep in context what this post was about. If we&#8217;re talking about doing DOM building of a page on the server, we don&#8217;t have browser event handlers in place, so this problem is moot. </p>
<p>Another argument for the DOM approach is the ability to use CSS Selector Engine searching to find elements in the DOM tree. There&#8217;s no doubt that Selector Engines are extremely powerful, and using them to traverse through a DOM is equally so. But, when dealing with server-side DOM building, it&#8217;s less likely that you&#8217;re going to be layering on modifications to your DOM where you need to search the entire DOM repeatedly. For instance, you may be using the DOM approach, and you may &#8220;attach&#8221; a widget to the server DOM representation, where it creates a new set of DOM nodes to represent itself. But it probably won&#8217;t need significant CSS Selector Engine functionality to parse through the entire DOM when doing so &#8212; it&#8217;ll probably just operate locally in one area of the DOM.</p>
<p>And even if you did need to be able to push in different content into different places of your &#8220;page&#8221; (attaching different widgets), the regular expression&#8217;s power over string searching should be equally capable for most tasks. Your template engine can use a layered approach, where you have sub-template place holders for each widget, and you can tell the template-engine to build out each sub-template separately.</p>
<p>In short, there are cases where we may be more <em>used</em> to tasks (which we normally do in the browser) by using the DOM approach. But there&#8217;s not much if anything that can only be done with DOM and is impossible or impractical with string templating. It may take a slightly different mental shift to do so, but I believe templating can be every bit as useful (and perhaps even more so) as the DOM approach.</p>
<h3>Oranges AND Squares</h3>
<p>Let me wrap up this post with some final thoughts.</p>
<p>Firstly, as you&#8217;ve seen, there&#8217;s some awesome benefits to using server-side JavaScript to build your presentation layer UI. But more importantly, there&#8217;s more than one way to use server-side JavaScript to build it. You can use the DOM, you can use string templating, and you can do both in the same page as you see fit.</p>
<p>I&#8217;d just encourage you to realize that the DOM is merely a concept that the browser forced upon UI developers &#8212; it&#8217;s not actually the most natural way we deal with HTML for the presentation layer. The DOM API in the browser has some really ugly and non-performant warts, and yet some people are going to great efforts to emulate the DOM on the server. I wonder why we&#8217;re trying to shoe-horn the DOM into the server to accomplish our tasks, rather than using a string-templating approach that can work pretty naturally and capably in both places <strong>without any extra emulation layer</strong>?</p>
<p>But even if you&#8217;re unconvinced that the DOM is sort of an abnormal approach for the server-side presentation layer building, at least accept that both the DOM and templating approaches will serve you well. Pick which one (or both) is best for each task, and for each part of your development team. </p>
<p><strong>Don&#8217;t just assume that because YUI or jQuery are DOM-centric, that this is the only way to achieve server-side JavaScript presentation layer logic.</strong> A good, intelligent templating engine will give you all the same tools, and may just be a more natural and efficient fit in the long run.</p>
<div><a class="addthis_button" href="//addthis.com/bookmark.php?v=250" addthis:url='http://blog.getify.com/orange-vs-square/' addthis:title='You&#8217;ve got your orange thing, I&#8217;ve got my square '><img src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125" height="16" alt="Bookmark and Share" style="border:0"/></a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.getify.com/orange-vs-square/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To capture or not to capture</title>
		<link>http://blog.getify.com/to-capture-or-not/</link>
		<comments>http://blog.getify.com/to-capture-or-not/#comments</comments>
		<pubDate>Thu, 04 Nov 2010 06:32:01 +0000</pubDate>
		<dc:creator>getify</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[back-references]]></category>
		<category><![CDATA[capture]]></category>
		<category><![CDATA[capturing]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://blog.getify.com/?p=644</guid>
		<description><![CDATA[As is quite common for this blog, I&#8217;m gonna write this little rant post as a result of a twitter argument (&#8220;twargument&#8221;?). The topic is &#8220;capturing&#8221; in regular expressions, and how that behavior affects string.split(RE), specifically in JavaScript. As usual, move on if you already disagree with me and don&#8217;t care what I have to [...]]]></description>
			<content:encoded><![CDATA[<p>As is quite common for this blog, I&#8217;m gonna write this little rant post as a result of a twitter argument (&#8220;twargument&#8221;?). The topic is &#8220;capturing&#8221; in regular expressions, and how that behavior affects string.split(RE), specifically in JavaScript. As usual, move on if you already disagree with me and don&#8217;t care what I have to say. Blah blah.</p>
<p>What follows is an approximation of how I&#8217;ve learned about regular expressions and capturing across my varied programming career of 10+ years. If you want to skip over the boring &#8220;history&#8221; lesson and jump to my conclusions, <a href="#lookclosely">take a look</a>.</p>
<h3>^</h3>
<p>I would venture to guess that almost all programmers first learn a programming language (or several), and THEN as that knowledge matures, eventually they&#8217;re exposed to and learn regular expressions. In fact, I bet most programmers never formally learn regular expressions &#8212; they just kinda pick up on the syntax and fumble their way through it. Wash. rinse. repeat.</p>
<p>For my first 5 years of development, this was true for me. I read some bits and pieces of documentation on regular expressions, but for the most part, I was just trying to figure out what worked through trial and error. Gradually, over time, I learned various different things that helped make regular expressions seem clearer. Even this far into my career, I&#8217;d say I probably am only a 6 out of 10 in terms of understanding them &#8212; but that&#8217;s enough to make good use of them.</p>
<p>Before I ever wrote my first regular expression, I&#8217;d programmed in several languages . I wrote code (at one time in my life) in C, Basic/QBasic, Pascal, PHP, and JavaScript, before I ever had to write my first regular expression. </p>
<p>And you know what one of the first syntactic things I learned in those languages was? In writing arithmetic and boolean logic expressions, you have to use ( ) to group operand expressions. In fact, I learned that in math class long before I ever wrote my first line of code. If I want to express 3 * 2 + 5 and I expect 21 instead of 11, I have to write 3 * (2 + 5).</p>
<p>The use of ( ) to group operand expressions is fundamental to mathematical syntax and to almost every major programming language I&#8217;ve been exposed to.</p>
<h3>( ) overloaded</h3>
<p>Then I started learning regular expressions, and I wanted to do something like find in a longer string occurrences of &#8220;ab&#8221; or &#8220;abab&#8221; or &#8220;abababababab&#8221;, etc. In more formal terms, I wanted to &#8220;match against a string and see if there&#8217;s one or more occurrences of &#8220;ab&#8221; in a sequence anywhere in the string&#8221;. </p>
<p>And so how did I write my regular expression? First I tried /ab+/, but that didn&#8217;t work, because it gave me things like &#8220;abbbbbb&#8221; which is not what I wanted. Then I remembered my mathematical and programming roots, and realized the problem was operator binding (precedence), and that I needed to group &#8220;a&#8221; and &#8220;b&#8221; together and apply the + operator to the group. So, my regular expression became /(ab)+/, and voila &#8212; it worked!</p>
<p>It never occurred to me that the return value from <a href="https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String">.match()</a> has other elements in it. All I cared about is the &#8220;ababababab&#8221; part at result[0]. And my regular expression faithfully does that.</p>
<p>Repeat that process hundreds of times over the years, in gradually more and more complex scenarios, and I&#8217;ve learned, informally through trial-and-error, how to write regular expressions that do what I want them to do (well, sorta) &#8212; when I need to group things, I use ( ).</p>
<p>Then, somewhere along the way, I read about this interesting notion of &#8220;capturing&#8221;. The first exposure I have to capturing groups is related to getting at the captured group data in the result array from a .match() call. For instance, result[1] has the first captured group match, result[2] has the second matched group, etc.</p>
<p>It&#8217;s a very powerful concept in regular expressions, and with it I realize I can do some really cool stuff. For instance, I write this regular expression: /foo(bar|(\d+))/. What I&#8217;m looking for is if &#8220;foo&#8221; happens, and is either followed by &#8220;bar&#8221;, or if not, if it&#8217;s followed by some digits, and maybe I specifically want to know what those matched digits are.</p>
<p>I read documentation on accessing capture groups from a call to .match(), and it says to count (1-based) the left parentheses to find out the ordinal index of the group, and then I can find my capture match in the result array at that location.</p>
<p>So, if I want to know the numbers found after foo, I look for results[2] (2 being the ordinal index of the second capture group, meaning the second left parenthesis in my regular expression). If results[2] is populated, then I know it must be the numbers that were found immediately after foo. And the overall match is at results[0], faithfully, just like it always was. And I&#8217;m happy. And I&#8217;ve solved my task (or so I think). I pay no attention to what may or may not be at results[1], because for my task I frankly don&#8217;t care.</p>
<h3>\2</h3>
<p>Of course, this documentation and testing has now misled me a little bit to think that the intention of &#8220;capture&#8221; groups is to be able to extract the matched group values. In fact (and I didn&#8217;t learn this until a lot later, as it&#8217;s often not expressed well if at all), <strong>the main purpose of capture groups is for &#8220;back-references&#8221;</strong>. Whaaa?</p>
<p>Back-references are extremely powerful: they allow you to reference the value of an earlier match later in the same regular expression. For instance, you might have a group like (["']) which can either match a &#8221; or a &#8216;, and then you can later reference the value from that match to make sure that you are checking for the same value on the other end of the match (eg: either &#8220;&#8230;&#8221; or &#8216;&#8230;&#8217;). BTW, you accomplish the back-reference by saying \x where `x` is the ordinal index of the capture group you want to grab the value from.</p>
<p>The moral here is that capture groups are primarily about serving back-references. The fact that .match() and other API methods return the capture groups in the results array is not directly related to the processing behavior of the regular expression &#8212; it&#8217;s an intentional choice of the designer of that API. </p>
<p>They chose to assume that if the regular expression matches some capture groups, then that means the author must want those capture groups returned. Sometimes that&#8217;s true, but it&#8217;s certainly not universally true. <strong>Unfortunately, there&#8217;s no way to use capture groups in the regular expression without having them returned in the results.</strong></p>
<h3>(?: )</h3>
<p>Several years down the road, still never having taken any formal class or read (and understood) any formal documentation on regular expressions, I feel decently confident that I know regular expressions enough to get what I need from them. When I need to group, I use ( ). And when I need to get capture groups, I also use ( ). The overloading of those two behaviors into the same operator hasn&#8217;t really bothered me too much, or even really directly occurred to me.</p>
<p>Then I ran across someone else&#8217;s complex regular expression, and I saw something like: /foo(?:bar|(\d+))/. That regular expression looks familiar, except for the &#8220;?:&#8221; in there. So I go searching through online documentation to try and find out what the heck that means. And I find some resource somewhere that says that (?: ) is officially a &#8220;non-capturing group&#8221;. Hmmm, I think. &#8220;Non-capturing&#8221;. Why do I care to make something &#8220;non-capturing&#8221;. I brush this off as not really making much sense.</p>
<p>6 months later, I saw such a non-capturing group again, this time in a regular expression one of my coworkers wrote. I asked them about it casually, and they said, &#8220;well, if I don&#8217;t need something to capture, it&#8217;s faster in the processing to tell the regex parser to not worry about capturing it.&#8221; Hmm&#8230; makes sense, I guess. Tell the regex engine if I don&#8217;t need it to capture. But I&#8217;ll be darned if I am gonna be very likely to remember that extra (ugly and non-semantic) &#8220;?:&#8221; in all my regular expression groupings. Besides, I&#8217;ve never been bitten by capturing hurting anything, including any performance hits that mattered to me.</p>
<h3>Experiments</h3>
<p>But I probably started to play around with &#8220;?:&#8221; a little bit anyway, out of curiosity. I went back to my /foo(bar|(\d+))/ regular expression, and I added in the ?:, and I tried to run my code and see if it went faster. I was dismayed to see that not only does it not go faster, <strong>but it breaks completely!</strong> What!?</p>
<p>So I go searching again, not really knowing what google keywords to use, and eventually I stumble back over that same document I read a long time ago about capture groups, and I re-read it, and I see in there that they say to count left parentheses <strong>of captured groups</strong> to find my matched group in the results array. It clicks! When I made one of my groups non-capturing, now I have to go change all my code that looks for the match of the digits at results[2], and change it to look at results[1].</p>
<p>Ugh. All I wanted to do was experiment with non-capturing groups for improving performance. After all, that&#8217;s why we do non-capturing right &#8212; to improve performance?</p>
<p>I don&#8217;t want to have to refactor a bunch of code and then re-validate it all. This &#8220;non-capturing&#8221; thing is starting to seem a little bogus to me. It seems more work than it&#8217;s worth. And it is definitely uglier. And so, not seeing any clear benefit, I doubt I&#8217;m gonna remember in the future to use non-capturing on most new regular expressions I write.</p>
<h3>.*?</h3>
<p>Some more years go by, and now I&#8217;m enrolled in computer science in college. I go through and take several computer science courses, and in a few of them, we get introduced briefly to the topic of regular expressions. They never go much deeper than the basics though. They perhaps mention in passing the idea of capturing (and maybe even non-capturing), but they never give us any practical reasons why that concept matters much.</p>
<p>As I advance through more and more computer science classes, I have the occasion from time to time to write regular expressions for various tasks. I do so based on my somewhat grass-roots previous learning (my classes certainly never rigorously go through much of it beyond surface intro stuff). And I get by just fine. </p>
<p>Not once is my code (or tests) ever graded down because I improperly &#8220;captured&#8221; a group I didn&#8217;t need to. I suppose the professor overlooked such details (maybe he doesn&#8217;t get why it matters, either), or perhaps he noticed it but it wasn&#8217;t what was being tested so it didn&#8217;t make sense to confuse me or grade me down. In any case, I sail on through these classes, only barely aware of the concept of &#8220;non-capturing&#8221; as it applies to regular expressions and groups, and blissfully unaware of any &#8220;bad practice&#8221; that&#8217;s now taken deep foothold on my programming style.</p>
<p>Then I take a class called &#8220;Theory of Abstract Languages&#8221; &#8212; really deep theoretical stuff. I&#8217;m humored to see that soon into the curriculum, we start talking about &#8220;regular expressions&#8221; &#8212; and I think to myself, &#8220;oh, I&#8217;ve written them for years, I&#8217;m good on this.&#8221; We actually go really deep into the theory behind regular expressions, and transformations, and all kinds of other academic stuff.</p>
<p>But again, we sort of gloss over the concept of &#8220;capturing&#8221; or &#8220;non-capturing&#8221;, as that kind of thing isn&#8217;t really germane to a theoretical discussion on regular expressions, it&#8217;s more the stuff of practical programming. I&#8217;m completely unaware of the fact that my &#8220;Theory&#8221; professor assumed that my &#8220;Programming 101&#8243; professor would cover regular expressions in detail, and that vice versa my &#8220;Programming 101&#8243; professor skipped over the nitty gritty of regular expressions, assuming my &#8220;Theory&#8221; professor would deal with it in detail.</p>
<p>And so, I graduate from college, with the entire curriculum of an engineering computer science degree in my head, and I&#8217;ve still never really had anyone explain to me why this notion of auto-capturing of groups is important, and moreover why it&#8217;s important to explicitly &#8220;non-capture&#8221; when I don&#8217;t need capturing. I&#8217;ve been exposed to this topic a dozen times now, but no one nor teaching has ever treated the topic with the rigor it deserves and requires, and I&#8217;m none-the-wiser in my ignorance.</p>
<h3>{3,}</h3>
<p>Fast forward another several years beyond getting my degree, and I still am writing code, and I&#8217;m still using regular expressions from time to time. And I&#8217;m pretty comfortable with them now. I&#8217;m rarely in a position where they don&#8217;t do what I want them to do, sometimes even the first time!</p>
<p>And I&#8217;ve seen the peculiar &#8220;?:&#8221; enough times that it doesn&#8217;t really catch my eye too often when I see others&#8217; code using it. I still don&#8217;t use it myself, because it&#8217;s never occurred to me why it&#8217;s really all that important to do so. It still seems like more trouble than it&#8217;s worth.</p>
<p>To this point, I&#8217;ve had several occasions to write fairly complex algorithms using regular expressions. I&#8217;ve written code parsers/compilers, where I had to tokenize a string, construct an AST from the tokens, etc. In even those advanced algorithms, I got by just fine with my incomplete knowledge of regular expressions, because I knew enough to get done what I needed, and the big picture or truly indepth stuff is something I get away with not needing.</p>
<p>To tokenize a string, I could use <a href="https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/split">str.split(&#8230;)</a>, but the hundred or so times I&#8217;ve used it in the past, across various languages, I&#8217;ve always known that .split() returns an array of the chunks of the string split up, but with the delimiter(s) removed. </p>
<p>This is fine, because there&#8217;s actually lots of times when I need that behaior. For instance, if I want an array of words, I get it by doing &#8220;one,two,three,four,five&#8221;.split(&#8220;,&#8221;). The .split(STRING) form works great for that task. Occasionally, I need something a little more sophisticated, and so I do something like &#8220;one,two|three;four,five&#8221;.split(/[,|;]/), and I still get the result I want. </p>
<p>In hindsight, almost never in those cases do I need to use .split(RE) with a regular expression that needs grouping or capturing/back-references. So, everything seems to work fine to me. I&#8217;m happily unaware of there being a silent gotcha waiting to trap me.</p>
<p>Since .split() doesn&#8217;t preserve my delimiters (or so I think), to tokenize a string, I devise a manual split process, doing something like this:</p>
<pre class="code">
var op_token_regex = /\+\-\*\//g,
	lc = "", rc = str, tmp, tmp2, captured = 0,
	tokens = []
;
op_token_regex.lastIndex = 0;
while (tmp = op_token_regex.exec(str)) {
	lc = RegExp.leftContext;
	rc = RegExp.rightContext;
	tmp2 = str.substring(captured,op_token_regex.lastIndex-1);
	captured = op_token_regex.lastIndex;

	// something found before operator?
	if (tmp2 &#038;&#038; tmp2.length > 0) {
		tokens.push(tmp2);
	}
	tokens.push(tmp[0]);
}
if (rc) {
	tokens.push(rc);
}
</pre>
<p>I use <a href="https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp">RE.exec(str)</a> with a global regular expression, then using a while loop, I step through each match one at a time, and &#8220;capture&#8221; what I need from the original string input.</p>
<p>Admittedly, this code may not be super pretty, but it certainly gets the job done. The better part is that it gives me the code structure control to do more sophisticated things, like discarding certain tokens (like comments) during processing, based on certain conditions.</p>
<h3>Stumble</h3>
<p>Up to this point, I&#8217;ve been able to do some pretty complicated things with regular expressions, and I&#8217;ve felt like I&#8217;ve been pretty competent at them.</p>
<p>So it came as a big surprise to me yesterday on Twitter when I was asked about this regular expression: /(\s+|\-+)/ and why when used with .split() on a string like &#8220;a b-c&#8221;, it didn&#8217;t produce just ["a", "b","c"] like it seems it should. And in the ensuing discussions and testing, all that I&#8217;ve talked about so far in this blog post came crashing down together.</p>
<p>Instead, at least in most modern browsers, it produces ["a", " ", "b", "-", "c"]. Well, it&#8217;s obvious from looking at that what is happening. The delimiters I&#8217;m splitting on are now suddenly showing up in the results array along with the split chunks. Previously in all of my usage of .split() it&#8217;s never retained the delimiters in the results array, but now it seems to be. What&#8217;s the difference?</p>
<p>My initial reaction was that the ( ) is unnecessary in that regular expression (you don&#8217;t need the grouping), and indeed, if you remove them, the results array is as expected. It&#8217;s then that I realized what was happening: it&#8217;s not the grouping that&#8217;s changing the results array, it&#8217;s the capturing. Just like with .match(), .split() responds to capture group data by returning that data interleaved into the results array. </p>
<p>And there it is: 10+ years programming in half a dozen different languages, and I&#8217;ve never once seen anyone talk about or explain that the regular expression group capturing I knew about from .match() actually had some behavioral side effect for .split() &#8212; using the ( ) &#8220;capture&#8221; operator in the regular expression in this case is taken to mean &#8220;include captured data in the final result set&#8221;. This we knew was true of .match(), but it was far less obvious that the same was true of .split(), until now.</p>
<p>Except, the fly in the ointment is that .split() being used in this way is not reliable in JavaScript cross-browser. Though it may be specified in the &#8220;standard&#8221;, the behavior was not properly implemented in previous versions of IE. </p>
<p>So, we can&#8217;t exactly rely on this behavior. And more importantly, if we&#8217;re not diligent to avoid capture groups with .split(RE), we&#8217;ll introduce cross-browser bugs.<br />
<a name="lookclosely"></a></p>
<h3>Look closely</h3>
<p>Let&#8217;s examine what&#8217;s going on here.</p>
<p>First, the original designers of regular expressions decided that ( ) groups should, by default, capture. They decided, for some reason, that even though ( ) are almost universally used primarily for grouping in mathematics and all modern computer languages, for regular expressions they would break with that precedent and instead make ( ) primarily be about capturing, with grouping just being an offshoot behavior.</p>
<p>Why? Perhaps in their usage, back-references were much more common than just using ( ) for expression operand grouping. Perhaps they didn&#8217;t often use regular expression patterns that needed grouping. Or perhaps there&#8217;s some other motivation for this decision that is escaping me.</p>
<p>I&#8217;ve definitely used back-references, but I&#8217;d have to say that by far this is the less common usage &#8212; overwhelmingly, I use ( ) to group elements more for operator binding than for capturing/back-references.</p>
<p>Even if we accept that ( ) should have capturing behavior (I&#8217;m not convinced of that), <strong>since capturing/back-references seem like they are by nature less performant, it seems quite strange to me that this is the default behavior of the regular expression engine.</strong> </p>
<p>Wouldn&#8217;t it have made more sense to have capturing be an opt-in (rather than opt-out) behavior? For instance, the &#8220;?:&#8221; could be used on a group to capture it. Or, there could have been a modifier flag for the whole regular expression which turns on capturing, like &#8220;c&#8221;.</p>
<p>Moreover, even if you disagree with me about which should be the default behavior, I would still see a lot of benefit if there was a modifier that could turn off capturing across the whole regular expression, instead of forcing the author to turn it off for each group with &#8220;?:&#8221;. I for one would probably just use that modifier all the time, and only not use it in those rare cases where capturing is important to me.</p>
<p><strong>Why isn&#8217;t there an easy way to turn off capturing for the whole regular expression?</strong></p>
<h3>APIs</h3>
<p>Now, let&#8217;s examine the API&#8217;s for .match(), .exec(), and .split(). The designers of these functions decided that they would piggyback on the capturing behavior from the regular expression to trigger whether the capture data should be returned in the results array. <strong>This was a separate and intentional decision on top of the decision made by the regular expression engine to default to capturing with ( ).</strong></p>
<p>The assumption is that if I use a ( ) capture group in my regular expression (perhaps for its most direct purpose: back-references), that I <strong>also</strong> <em>want</em> that captured group data returned in my results array. In other words, the &#8220;capture&#8221; directive in a regular expression is interpreted by the API to mean &#8220;include captured data into results&#8221;. This is not an entirely self-obvious or semantic leap to have been made, and it&#8217;s certainly not directly explicit, but rather just an implicit side-effect behavior.</p>
<p><a name="backref"></a><br />
<strong>What if I want to use capture groups for back-references, but I <em>don&#8217;t</em> want that captured data in my results array?</strong></p>
<p>Imagine this:</p>
<pre class="code">
var str = "some 'g' good \"s\" stuff going on 'h' here";
var results = str.split(/(["']).\1/);
       // want: ["some", "good", "stuff going on", "here"]
       // get: ["some", "'", "good", """, "stuff going on", "'", "here"]
</pre>
<p>You see, in this case, I only have a ( ) capture group for the back-reference (\1) sake, and <strong>I don&#8217;t need or want that captured group to be in my returned results</strong> (in fact it makes my job harder to filter out that noise). </p>
<p>Because of the faulty assumptions made by the API functions, I can&#8217;t use back-reference capture groups in my regexes without automatically getting those groups in my result set. <strong>This is a failure of design on the API&#8217;s part.</strong> And a bunch of languages, not just JavaScript, made the same mistake (and/or copied each other&#8217;s mistakes).</p>
<p>There&#8217;s some inherent inconsistency, too. What happens to the &#8220;capture data&#8221; in a regular expression used by .test()? That data apparently gets silently discarded, because .test() only returns a Boolean. If I use capture groups inside a regular expression executed for .test(), those capture groups have no effect on how .test() returns its value. This is strange and inconsistent.</p>
<p><strong>Wouldn&#8217;t it have made more sense for there to be an explicit instruction to the API that you want any capture data collected to be returned in the results?</strong> For instance, why couldn&#8217;t we have had str.split(RE, count, includeInResults{=false})? That way we could explicitly declare whether we want the captured data in our results or not.</p>
<h3>Lesson Learned&#8230;</h3>
<p>&#8230;the hard way, I guess. Now I have to try and re-train myself to think of ( ) as primarily a capturing mechanism and not a grouping mechanism. And I&#8217;m betting I&#8217;m not the only developer that has been astray on this topic for a long time.</p>
<h3>$</h3>
<div><a class="addthis_button" href="//addthis.com/bookmark.php?v=250" addthis:url='http://blog.getify.com/to-capture-or-not/' addthis:title='To capture or not to capture '><img src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125" height="16" alt="Bookmark and Share" style="border:0"/></a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.getify.com/to-capture-or-not/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Some better than none: sync vs. async in ssjs</title>
		<link>http://blog.getify.com/some-better-than-none-sync-async-ssjs/</link>
		<comments>http://blog.getify.com/some-better-than-none-sync-async-ssjs/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 20:53:05 +0000</pubDate>
		<dc:creator>getify</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Performance Optimization]]></category>
		<category><![CDATA[UI Architecture]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[middle-end]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[ui]]></category>

		<guid isPermaLink="false">http://blog.getify.com/?p=622</guid>
		<description><![CDATA[This past week, I had a fantastic time travelling over to Warsaw, Poland for Front Trends 2010 conference. I was very honored to be invited to speak, and I gave the most recent incarnation of my talk on middle-end UI architecture, Rise of the Middle End (slides). Overall, the reaction in discussions over beer, feedback/ratings, [...]]]></description>
			<content:encoded><![CDATA[<p>This past week, I had a fantastic time travelling over to Warsaw, Poland for <a href="http://front-trends.com/">Front Trends 2010</a> conference. I was very honored to be invited to speak, and I gave the most recent incarnation of my talk on <a href="http://middleend.com">middle-end</a> UI architecture, <a href="http://front-trends.com/speakers#kyle-simpson">Rise of the Middle End</a> (<a href="http://www.slideshare.net/shadedecho/rise-of-the-middle-end">slides</a>).</p>
<p>Overall, the reaction in discussions over beer, <a href="http://speakerrate.com/talks/4871-rise-of-the-middle-end">feedback/ratings</a>, and tweets has been <a href="http://twitter.com/#!/bluecherry/status/28421536006">pretty</a> <a href="http://twitter.com/#!/matas_petrikas/status/28016533146">positive</a>. Even for those who seem to strongly <a href="http://twitter.com/#!/ls_n/status/28715874411">disagree</a> with my thoughts on this topic, I&#8217;m pleased that people are talking about it, which means they&#8217;re thinking about it, which means I accomplished my real goal: get people to start asking the right questions.</p>
<p>One of the most fascinating things to me was that immediately after I gave my talk, <a href="http://front-trends.com/speakers#douglas-crockford">Doug Crockford&#8217;s talk on server-side JavaScript</a> in some respects presented a very different view on things than my approach. So, back-to-back, two talks focused heavily on server-side JavaScript, and yet had a very different take on a key issue: synchronous vs. asynchronous.</p>
<p>After a <a href="http://twitter.com/#!/Mekk/status/28695990312">few twitter conversations</a> on the distinct differences between those two messages, I felt like it would be a good idea for me to write up a quick post to clearly state what I think about the sync/async debate, as I&#8217;m not sure it&#8217;s been well understood thus far.</p>
<h3>Crockford is right</h3>
<p>That&#8217;s correct, you did just read that. I think Doug is absolutely correct in his assertion that non-blocking I/O and asynchronous evented architecture are by far superior to synchronous, blocking I/O. There&#8217;s no question that JavaScript shines as an asynchronous language (despite the pains of us figuring out the right way to approach patterns like Promises), and so it&#8217;s a completely natural and right direction for Node.js to be taking server-side JavaScript into the asynchronous paradigm.</p>
<p>If I wasn&#8217;t clear enough, let me say it this way: <strong>async is better than sync.</strong></p>
<h3>Synchronous Middle-End?</h3>
<p>So wait, if I agree that the async Node.js approach is the best way, why do all my talks deal with synchronous JavaScript on the server via <a href="http://bikechainjs.com">BikechainJS</a>?</p>
<p style="background-color:lightblue;">
What you really need to understand is that just because you <em>run</em> code synchronously doesn&#8217;t mean that such code will <em>only</em> run in a synchronous environment. If you write the code correctly, which I&#8217;m trying to do, then synchronous vs. asynchronous execution concerns become rather moot. More on that later below.
</p>
<p>Again, let me be clear: <strong>BikechainJS is <em>not</em> intended to replace or compete with Node.js.</strong></p>
<p>BikechainJS and my synchronous per-request POC code/demos for the middle-end are merely side shows to the main attraction: the importance of middle-end architecture in the web stack.</p>
<p>Hear me on this: <strong>you should be architecting the middle-end into your web applications regardless of whether you use JavaScript on the server or not, and regardless of if you want/need to do things synchronously or asynchronously.</strong></p>
<h3>The back story</h3>
<p>So now, let me explain why I still am showing middle-end code demos using synchronous approaches.</p>
<p>Considering the variety of web application stacks I&#8217;ve worked in over the last several years, these architectures have ranged from Java to Grails to .NET to Python to PHP. The code bases have spanned from decent to horrible. Web performance optimization is always a tuning after-thought rather than a key business-driver feature. And the applications have all been built from the back-end up rather than the front-end down.</p>
<p>In all cases (and indeed for most of my 10 year web dev career thus far), some things have been true across the board:</p>
<ol>
<li>I&#8217;m a front-end/JavaScript person, and I prefer <em>not</em> to have to dig deep into the bowels of back-end platforms to accomplish my front-end tasks.</li>
<li>Universally, when I attempt to re-factor the front-end for web performance optimization, I run into the hassles of not having adequate control over the middle-end pieces (buried deep in the platform/framework guts), and moreover, I often have to deal unnecessarily directly with how the back-end works just to make changes to the front-end.</li>
<li>And most importantly, they have <strong>all</strong> been 100% in the synchronous, per-request paradigm of the web stack. They all used synchronous, single-threaded (process-oriented) web servers like Apache or IIS.</li>
</ol>
<p>That reality may not be your reality where you work. And if not, great. But having had a dozen jobs so far and seeing the same patterns repeated consistently, I&#8217;m willing to believe it&#8217;s more the norm than the exception that synchronous per-request is the overwhelmingly common paradigm, and that no matter how hard platforms/frameworks try to keep front-end and back-end separate, they all fail to do so well, when the rubber meets the road.</p>
<p>So, mentally walk in my shoes for a moment, in the types of jobs and environments I&#8217;ve been in for the majority of my career. The web application code bases are well established, and for the most part working <em>ok</em>. The boss(es), and 3-8 other devs on the team, are all pretty familiar with that paradigm. The IT support staff are all very familiar with the quirks and behaviors of that stack, and are in some cases quite experience/certified in how to manage it and ensure production up-time.</p>
<p>More than anything, stability and status-quo rule the pack.</p>
<h3>The plot thickens</h3>
<p>For my last several jobs over the last 2+ years (aka, long before Node.js came around), I&#8217;ve been trying to find a pragmatic way to introduce server-side JavaScript into the architectures of the web application stacks at my jobs. My goal has been to find a realistic way to address the middle-end failings of all these different web application stacks, especially in a repeatable and patterned way, and I feel that server-side JavaScript provides a very compelling answer to many of those questions.</p>
<p>Reality check: It is not particularly easy to introduce asynchronous code into a synchronous environment. It requires some very careful planning and often times some complicated and risky changes to existing architecture. In short: <strong>it messes with stability and the status quo.</strong></p>
<p>I&#8217;ve tried to pitch how server-side JavaScript can be of benefit, but whenever my boss or team starts to evaluate how to make that happen, major sticking points like that are always the death of the idea. For my jobs, their pragmatic (and defendable) position is that they would not be willing to devote the time to such radical rearchitecture (like replacing Apache with Node.js) and/or re-writes of significant portions (or all) of the code base, just to get JavaScript in the server part of the stack.</p>
<p>The biggest argument for server-side JavaScript at this point is massively scalable performance. But most of the jobs I work at never see enough performance bottlenecks where we could ever eventually prove the speedups in any measureable way.</p>
<p>All in all, I&#8217;ve had to face this reality in my jobs: <strong>if I want to do server-side JavaScript, I&#8217;m going to have to find a way to do it that fits more nicely into existing architecture and doesn&#8217;t cause quite so many waves.</strong> Otherwise, server-side JavaScript will only ever be something I get to toy around with on fun experiments like chat servers (web sockets) and games, and never see any real production usage.</p>
<h3>Swim with the current</h3>
<p>Instead of trying to simultaneously fight the battle of changing to asynchronous architecture <em>and</em> to switch to middle-end architecture using server-side JavaScript, I&#8217;ve chosen to separate those two battles and focus just on one of them for now.</p>
<p>What do I mean? There&#8217;s still an uphill battle to get server-side JavaScript into the stack in my real-world job place, but if it&#8217;s ever going to happen, it&#8217;s almost certainly going to need to play easily and nicely with the existing synchronous per-request paradigm.</p>
<p>Is it possible to use Node.js in this way? Theoretically, yes. You can run Node.js as a separate server instance, and from your synchronous code (PHP, Java, etc), make a blocking I/O call over to Node.js to ask it do something, and block the process waiting for the response. You can run Node.js like a proxy server in tandem with Apache, etc.</p>
<p>But notice what we&#8217;ve done: we&#8217;ve taken something whose almost entire value-proposition is performance from asynchronous eventing, and handcuffed it in an awkward and inefficient way to make it fit into our synchronous world.</p>
<p>In that respect, running synchronously against Node.js over an HTTP connection is not really conceptually much different than just running the JavaScript directly in a sub-process. It is however probably going to lose any possible performance benefits when considering the on-server overhead of making such HTTP requests as opposed to direct process execution/communication. Moreover, Node.js is phenomenally powerful and complicated, but restricting it in this way makes it be very much overkill for the simple things we want it to do.</p>
<h3>Switch into an easier gear</h3>
<p>After pushing about and thinking on these complications for quite awhile, that&#8217;s where BikechainJS was born. It&#8217;s an incredibly simple, stripped-down, highly-focused, synchronous, single-thread/single-process approach to running server-side JavaScript. It has only the bare minimum necessary to accomplish middle-end tasks, like file I/O, process execution, etc. It has NONE of the bells and whistles that make Node.js so sexy, but it also isn&#8217;t hampered by any of them getting in the way of its very specific task: run the middle-end.</p>
<p>BikechainJS is <strong>not</strong> what you want if you&#8217;re planning to write a significant amount (or all) of your server code in JavaScript. BikechainJS cannot do many of the most awesome things that Node.js can do.</p>
<p>But BikechainJS can easily be fired up to execute some simple JavaScript logic for middle-end tasks without creating too much overhead in your existing web stack and code. It has absolutely no dependencies except V8, and it&#8217;s just a single executable binary file. It&#8217;s footprint is extremely small, and it touches almost nothing that you don&#8217;t tell it to, which greatly reduces the risks involved in introducing it into your existing code.</p>
<p><strong>BikechainJS is not designed to replace Node.js, it&#8217;s designed to run in all the other places where Node.js is too awkward, risky, or difficult to implement.</strong></p>
<h3>Sync vs. Async</h3>
<p>It&#8217;s <strong>very important</strong> to note that just because I&#8217;m <em>currently</em> executing JavaScript in a synchronous way doesn&#8217;t meant the code itself is purely/only synchronous. </p>
<p>Most of the code I&#8217;m writing, I&#8217;m using Promises as a way to cleanly express logical steps which are reliant on each other, even if one of the steps is not immediate/synchronous. Using Promises, code can easily be written that will <strong>either work synchronously or asynchronously</strong> with little or no code changes at all.</p>
<p>You see, I obviously don&#8217;t want to needlessly tie my code to <em>having</em> to run in a synchronous way. I just want to execute my code synchronously, <strong>for now</strong>, so that it&#8217;s easier to integrate with the rest of the synchronous stack. That&#8217;s not so evil, is it?</p>
<p>You can very easily take a lot of this code and drop it into an asynchronous JavaScript environment like Node.js, and without much effort to adjust, it should run just fine. This is important in terms of future-proofing the code to be more useful as paradigms shift. I have the flexibility to run synchronous or asynchronous as the situation dictates. This is a very powerful realization.</p>
<h3>Putting it into practice</h3>
<p>How I am <em>currently</em> using this approach (and middle-end concepts) for my real-world job is to start to take little tiny pieces of the existing code, such as routing, headers, etc, and little by little, re-write them in JavaScript. In my existing web stack, when I want to hand off the execution context to my JavaScript code, I simply execute the BikechainJS environment and hand it some data and wait for the response. </p>
<p>Since the tasks it&#8217;s doing are very small and focused, it takes practically no time/overhead at all to do so (certainly way less than constructing and transmitting an HTTP request, even on-server, to a socket server instance).</p>
<p>Eventually, I hope that all of the middle-end logic in our app will be in BikechainJS driven middle-end JavaScript code. But I have the freedom to make the switch in very small chunks without major refactoring or rearchitecture. I&#8217;ve found a more plausible approach that my boss doesn&#8217;t get as worried about when I propose.</p>
<p>Bottom-line: <strong>I&#8217;ve found a way to introduce server-side JavaScript to an environment that otherwise flatly rejected the notion of a major sync-to-async rearchitecture.</strong> I believe that some server-side JavaScript is better than none at all.</p>
<p>I secretly hope that this move will be just a &#8220;gateway drug&#8221;, and that eventually, we can move to a much more powerful and asynchronous approach to our middle-end <em>and</em> back-end, probably using Node.js. Because I&#8217;ve written most of my code to be agnostic of such concerns, such a switch down the road shouldn&#8217;t be as painful as it otherwise might have been. We&#8217;ll already be more than halfway there with server-side JavaScript in the stack, and introducing asynchronous architecture patterns (while challenging) won&#8217;t necessarily be impossible to sell to the decision makers.</p>
<div><a class="addthis_button" href="//addthis.com/bookmark.php?v=250" addthis:url='http://blog.getify.com/some-better-than-none-sync-async-ssjs/' addthis:title='Some better than none: sync vs. async in ssjs '><img src="//cache.addthis.com/cachefly/static/btn/v2/lg-share-en.gif" width="125" height="16" alt="Bookmark and Share" style="border:0"/></a></div>]]></content:encoded>
			<wfw:commentRss>http://blog.getify.com/some-better-than-none-sync-async-ssjs/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Content Delivery Network via getiblog.2static.it

Served from: blog.getify.com @ 2012-05-18 09:39:14 -->
