Tale of Two -ends

TL;DR

If this post is too long for you to read, then you may not fully understand my motivations for this discussion or my proposal. However, if you’re already well familiar with SPA architecture background and want to just skip to my concerns about SPAs and modern session IDs, please do so! :)


This post is unlike many, where I’m trying to just spout off an opinion and convince you that I’m right and you… well, aren’t. Here, I’m just trying to explore something that is frustrating me with app architecture and UX.

I’m much more interested in what you have to say than what I have to say, so please speak up in the comments section below.

The web is full of back-end-driven single page apps, and plenty of front-end-driven single page apps, too. But what about in between? What about the hybrid-driven apps (see below)? Specifically, how do architecture and UX concerns intermix when you have rendering that occurs in both the server and the client?

This post is not a rant. It’s an invitation for useful and constructive feedback and exploration. Please don’t troll it. Please don’t just fawn over your favorite framework as if that’s a magic bullet. Let’s actually work through the UX and architecture pros/cons of these issues.

First, let me explain what I mean with these terms:

Single-page-app Background

SPA = single-page-app, where predominantly, there’s one major page request from the client to “open” the app, and all the rest of the content and interaction happens inside that page (via Ajax, web sockets, etc).

But not all SPAs get their page content the same way. Some apps have famously tried both sides of the fence. Twitter used to be back-end-driven, then they went front-end-driven, now they’re primarily back to back-end-driven architecture.

Back-end-driven SPAs

Some SPAs choose to only render page markup on the server, and ship that markup to the client. There’s little to no client-side templating/rendering that occurs. Whenever a page decides to update some content, they make a request to the server, retrieve a chunk of HTML in response, and stuff that markup into the live page somewhere.

The extent of client-side rendering that occurs is usually just the odd update of some form value or the insertion of new markup from the server.

Facebook is a decent example of this kind of architecture. Whenever you write a comment on a Facebook post, the front-end submits the comment to the back-end, and receives back a chunk of HTML to display the comment. Yeah, I know their site is more complicated than that, but it’s close enough for our purposes.

Front-end-driven SPAs

Other SPAs ship a templating engine (of various sorts) to the client, and choose to render most if not all of the markup in the client. In this model, the server usually only serves up data, not markup.

Twitter used to be a good example of this approach. They’d submit a request from their website to their API to get a list of JSON data for a tweet, and then in the client, use a templating engine to render the tweet in the client directly.

It’s important to note that this style of app has a bootstrapping concern inherent to it. On the initial page request from browser to server, the response back has to be some bare-bones minimal of HTML, enough at least to load the JS for the app (including the templates and template-engine). Different sites/apps approach this slightly differently, but it’s pretty common that the markup returned has at least the <DOCTYPE> and <html> declarations, and often also the <head> and <body. Usually these pieces aren’t directly overwritten/replaced with client rendering, though their contents can be modified with DOM methods (such as adding more scripts to the <head>).

Common to SPAs

After the individual page content for the SPA is rendered (either on the server or in the client), the typical approach is to replace the contents of the <body>, or perhaps some <div> element inside the <body>, with the new rendered markup.

Besides markup injection, front-end-driven and back-end-driven apps have several other things (roughly) in common.

URLs

The URL is very commonly meant as a definitive marker for the current location and/or current state of a user’s path through an application. For instance, imagine a user is in a webmail client, and they navigate to some folder of saved emails. The URL shown in the address bar often will indicate this particular location, say like: https://mycool.email/folders/saved-stuff.

By giving the location/state of the application to the user in the URL, the user can copy-n-paste that URL, save it in a bookmark, navigate forward/backward in their browser history, or even share and click on links to parts of the application, and have a pretty reasonable expectation of getting directly to where they want to be instead of having to navigate in-app to do so.

Of course, the default behavior for changing URLs in the address bar (however that may occur) is that the browser will make a new request to that server with that URL. This breaks the whole “single-page-app” paradigm, so work-arounds were needed to be able to update the URL but not cause a page request.

The old-school way of doing so was to persist the client-side state in the #hash portion of a URL, because the #hash can be modified by JavaScript without causing a new page request. For instance: https://mycool.email/#location=/folders/saved-stuff. Each change to the #hash helpfully creates a new entry in the user’s history, so forward/backward navigation works “as expected”. These URLs are also bookmarkable and shareable.

One major caveat though is that the #hash portion of the URL is never sent to the server. This means the server has no chance to know what location/state the user is requesting to go to, so there must be some manner of front-end-driven functionality where the JavaScript in the page inspects the address bar URL upon page load and renders the proper location/state for the user.

HTML5 provides a much more graceful solution to this caveat, by way of the History API. The History API lets JavaScript code update the URL of the address bar, even the non-#hash parts, without causing a new page request. These updates are called “state”, and you can either pushState(..) to add a new entry into the navigation history, or replaceState(..) to replace the current history entry with the new state.

Many modern (single-page) web applications generally now heavily use the History API in favor of #hash-based URL management.

Hybrid-driven?

One benefit to these more canonical URLs, beyond user readability and friendlier “UX”, is that if they are shared around, bookmarked, etc, when they are requested, the server will see the full URL of the request, and has a chance to respond directly with the user’s requested content, rather than having to wait for the front-end-driven SPA mechanism to kick in.

Of course, most front-end-driven SPAs don’t actually have the server respond with URL-specific content, and instead simply wait for the client-side mechanisms to take over. In this way, no matter what URL you request of the server, you’re likely to get the same front-end-driven bootstrapping response.

If you did have the server respond specifically, that’d be more of a hybrid-driven app architecture. This is the application architecture I want to examine with this post, shortly, but first we have to discuss some other concerns in SPA architecture.

State Management

Most applications and non-trivial sites are not just “content only” but have some component of interactive and persistent “state” associated with the user’s actions. For instance, if a user logs in to make comments on a blog, there must be some state that is maintained to “remember” that the user is logged in, such that page refreshes and even navigation away and back will keep the user in the expected state.

By far the most common place to persist state is on the server side. Even in a lot of front-end-driven apps, the server is relied upon for the actual business logic and state management, because the server centralizes in the database all the authoritative information about all users, etc.

Far less common, a front-end-driven application could also manage state entirely client side in the browser, referring to data on the server in a completely stateless per-request manner. Especially with the advent of peer-to-peer technology, it’s likely we’ll see more and more apps where state is managed in-client and shared directly with others (peers) only as needed.

If there is a server involved, and there usually is, the front-end will have to send some sort of unique identifying “token” along with its requests so the server recognizes who the request comes from. The token is usually a session ID, often stored in a session cookie (more on that in a minute!), or a unique user ID, or some other identifying mark.

URL-based Session IDs

Even if the server stores the full contents of the user’s session state, the client must still have a way to “persist” the token that uniquely identifies the user and/or their session.

One way that clients can persist session IDs is to include them as part of the URL, such that every URL has a parameter on it with the ID, like: https://mycool.email/folders/saved-stuff/?sessid=78de7823d2hhfdj2r299uf4484fj434

Every link, every button, every action you can take inside the application must know about this session ID and include it in the next URL you navigate to.

URL session IDs are most commonly used when an application detects it cannot set cookies (due to privacy settings, etc). They provide a workaround (ungraceful as it may be) to the lack of cookies, but they also significantly limit the re-usability of URLs.

If a user bookmarks a URL with a session ID in it, or (worse) shares it with someone else, now the URL is susceptible either to leaking access to a user’s session, or denying the legitimate user access at a later time because the saved session ID is no longer valid (expired, deleted, etc).

Session IDs in URLs are generally frowned upon as the worst-case fallback rather than the intentional architectural design for client-side session ID persistence.

Cookies

The more common way to persist the session ID on the client is through a cookie.

Cookies are bits of data stored on the client which are sent along automatically with every single request the client makes to the same host (protocol + host-name + port). Typically cookies are set by a previous response from the server, via a Set-Cookie response-header, though they can be set by JavaScript in the page as well. Cookies typically have an explicit timestamp set as their expiration, where the browser will automatically delete the cookie data after that point.

In addition to expiration, cookies have a size limit, and are often deleted when a user clears their browser cache. As such, their reliability as client-side persistence isn’t very strong. But for many years, it was the only form of client-side persistence we had access to.

Session Cookies

Session cookies are a special subset of cookies where the expiration isn’t set in terms of timestamp but in terms of event, specifically the closing of the browser. Once set, a session cookie for a particular host lives as long as the browser window (even across multiple instances) lives, unless the application specifically deletes the session cookie earlier (like when a user explicitly “logs out”).

The most helpful part about cookies is the automatic transmission, which means that if a user calls up the site from the server with a URL (from history, a bookmark, clicking some external link, etc), the cookie(s) (session or not) associated with the site are automatically transmitted with that request, so the initial request from the server is session-aware and can return content appropriate to the user’s session.

Of course, the automatic transmission of cookies is both a blessing and (mild) curse. Because the browser really can’t know which requests to a host need the cookie and which don’t, it sends them on all requests. Images, stylesheets, videos, JS files, you name it — they all get the cookies attached. When there’s just a small cookie with a session ID in it, this “curse” is not as big a deal, but the more content you store in cookies (session or not), the more you bloat every single request, which leads to slower web performance and more bandwidth usage (for both user and server!).

Another concern many people have with cookies comes from “third-party cookies”, which is where a request to load a resource from another host (such as the CDN of some third-party widget) results in a cookie being set on that secondary host, even though a main-page-request wasn’t made to that host. Third-party cookies have all the same benefits of normal first-party cookies (such as tracking third-party widget sessions, etc), but they also imply privacy concerns because advertising networks use them to track your behavior across many sites you visit. As such, many modern browsers are now disabling third-party cookies by default, and some users intentionally disable all cookies just for good measure.

Sharing Session Cookies

The deeper “problem” with session cookies, at least for some use-cases, is that they are shared among all tabs of a browser, even multiple instances of the browser, and live for as long as the browser lives. That means if a user opens up 3 different tabs in their browser, even 3 separate browser windows, all to the same page, all 3 tabs share the same session cookie, which means they cannot have separate distinguished sessions.

Sometimes that’s desirable, but other times it’s quite frustrating. It’s nice UX if you open up another tab and you’re automatically still logged in. But it can be annoying if you logout of one tab but didn’t mean to logout of the other tab. Session cookies are an all-or-nothing mechanism.

Imagine a user is searching through some set of database results, and they want to perform another search on some tangential topic. They will open another tab to the application URL (thereby sharing the same session), and try to perform the secondary task without affecting the task in the previous tab.

Airline reservation sites being a prime example of this problem. Your state of searching for flights in the first tab will often be affected by the actions you take in another tab. You may perform your secondary search, and then come back to the first tab only to find that the search you were conducting is now invalidated and you have to start over.

One sure-fire way to get separate session cookies is to open up a “private browsing” window session (“incognito mode” in some browsers), which generally guarantees a separate sandbox for your cookie data (where none of the data will be kept after the window closes). Beware, though, on some browsers, multiple instances and tabs of private-browsing windows share session cookies with each other, even though they’re in a separate sandbox from the non-private-browser data.

Of course, applications can (and do!) break these use-cases even without shared sessions, simply by only allowing the user in the server state to have one task they’re performing at once. In the same regard, applications can be designed to work-around this problem if they so choose. Some do help the user out, but others just simply pave over the session with whatever the last action in any tab was, leaving the user out-of-luck.

Session Storage

HTML5 again offers (what seems like) a solution to these various problems with session cookies. One of the mechanisms introduced to the web platform in the last few years is sessionStorage. sessionStorage is a JavaScript API that allows key/value pairs of data to be stored in the client, not in cookies (therefore not polluting every request), and is tied to expire with the browser.

sessionStorage is also a modern and powerful JavaScript API, contrasted with the rather rudimentary and limited mechanisms we use to read/write cookies. sessionStorage has a cousin, localStorage, which has the same API, but which is shared across all tabs, and keeps data around “forever” (until either the application or the user manually deletes it). Both storage APIs emit StorageEvents, which means your application can listen for changes to the storage (even from another tab in the case of localStorage) and keep application state in sync.

The most major difference is that sessionStorage data is tied specifically to each browser tab session, meaning two tabs to the same host do not share the session data, as they did with cookies.

This one key difference enables several use-cases around users having multiple sessions open to the same site, as discussed previously, as each tab can be issued a separate session ID (from the server, or from itself!), and actions in one tab will not affect the state of the other tab.

Session Storage Use-cases

There are a variety of use-cases where sessionStorage comes to the rescue, including temporary caching of client-rendered information, “remembering” form data in case of accidental browser refresh, saving draft progress on emails or blog posts, etc.

But the one use-case that seems most obvious is persistence of the session ID. However, the big caveat is that none of the data in sessionStorage is automatically sent to the server with each request, as is the case with session cookies.

If you want to use sessionStorage for your session ID persistence, the main browser-initiated page requests made to the server will be session unaware, which means that you will almost certainly need some or all of your rendering to be done via front-end-driven mechanisms. Of course, any JavaScript initiated Ajax/WebSockets requests can manually retrieve a session ID from sessionStorage and include it in the request to the server, so those responses can be session-aware.

The application can still be primarily back-end-driven in terms of rendering content, but the initial response will always have to be an unaware bootstrapping response than then calls back to the server subsequently with the session ID, receives the session-aware content, and displays it.

Not only does that complicate the architecture somewhat, it also degrades (even slightly) perceived performance in the UX, because the user can’t be shown anything specific and useful (other than perhaps site boilerplate wrapping) until after the second request/response completes.

Hybrid Redux

One way you might try to address the degraded UX of this double-request performance hit would be the previously briefly mentioned hybrid-driven apps approach, where the initial response from a browser-initiated page request includes specific session-aware content, and then after the front-end is bootstrapped, subsequent rendering is controlled by the client.

Another powerful pattern for hybrid-driven apps is what I like to call “reactive design”, where the application monitors the environment it’s running in, and makes decisions about which side of the connection should handle various tasks, like view rendering, based on ever-changing conditions like bandwidth, device performance, user interaction, etc.

A hybrid-driven app could start out rendering on the server, then test if either the available bandwidth is too low, or the client device performance is high enough to be able to render efficiently, and if either is true, transparently transition rendering over to the front-end. Later, if the conditions change again, it could shift rendering back. The application could even split rendering duties between client and server, again entirely transparent to the user.

Unfortunately, hybrid-driven apps (at least for the initial page response piece) are not possible if you use sessionStorage for session tracking, since the session ID is not transmitted automatically by the browser on the initial page request, as it is with cookies.

Unfortunate Choice

We finally arrive at the point of this long blog post. Thank you for hanging in there so far.

Hybrid-driven apps are more powerful than their counterparts — front-end-driven and back-end-driven — because they offer the promise of the best of both worlds. Front-end-driven architecture offers many great advantages, but back-end-driven architecture is better in other ways. Why should we have to choose?

Why can’t we have hybrid-driven apps on the modern web?

One major tradeoff is this whole business of what to do about session ID persistence. As discussed in detail previously, session IDs being stored in session cookies are far from ideal — I might say almost completely undesirable. But storing session IDs in the more preferable HTML5 sessionStorage cuts us off at the knees and prevents us from getting the most benefit out of hybrid-driven apps.

Is there any solution that solves both sets of problems without this tradeoff? Have I missed something?

I ask that question, open-ended, as the whole point of discussion I want to generate from this post. I invite you to share your thoughts on this unfortunate choice. Does the benefit of one side really outweigh the other?

Should we give up on the idea of hybrid-end apps if we want or need the benefits of modern sessionStorage persisted session IDs? Or should we just ignore the many benefits of sessionStorage as it relates to session ID persistence and fall back to session cookies so we can achieve hybrid utopia?

Proposal: Hope

I recently proposed a possible solution to this frustrating trade-off. The discussion on the WHATWG list didn’t really (yet) go anywhere beyond initial skepticism, so this blog post is intended to revive the discussion, and either push the proposal forward, or come up with some other better solution.

My proposal was that a server can send back a response header like Register-Session-ID: xyz which specifies the name (“xyz” in this case) of the session ID that will be stored in the sessionStorage store. Optionally, it can also provide a value for the session ID (Register-Session-ID: xyz=...), in which case the browser will automatically insert the data into the sessionStorage at that keyname. Or, your JavaScript code can do that manually, later.

Either way, once this session ID name has been registered the first time, upon the user’s first visit to the site, the browser will, when making any subsequent request to the page’s host, always try to find a session ID by that name in the sessionStorage, and if it finds one, send the value along as a request header Session-ID: .... And of course an app can change or delete this session ID name registration at any time by issuing another Register-Session-ID response header.

This proposal resolves the ugly tradeoff between modern session IDs and hybrid-driven apps, because session IDs can helpfully be stored in sessionStorage (instead of cookies), but will automatically be transmitted to the server to enable hybrid-driven architecture for first-page-request UX.

I’m not content to just live with the tradeoff between the power and potential of hybrid-driven apps and modern tab-centric session IDs. I think it can and should be a fixable gap in functionality.

What do you think? Really, I want this to be an open and productive discussion. Share your thoughts, but leave your trolls and fanboisms out of it. :)

Switch to our mobile site