Monday, June 15, 2015

Amdahl's Law, Gustafson's Law, and user experience stretching to optimize parallel web design

There are two widely known laws in parallel computing that are counterpoints to each other, but with thought they can be applied together to achieve dramatic improvements in parallel computing. They two laws are Amdahl's Law and Gustafson's Law. Roughly speaking, Amdahl's says you cannot speed a system up by parallelization to be faster than the the slowest portion. Using a simple metaphor, if you have a car, a bicycle, and a turtle, when traveling in a pack, you can only move as fast as the turtle. The counterpoint to this is Gustafson's law, which roughly states that for parallel problems, adding more resources can increase the net work completed. Using are above metaphor, if we add more turtles (or cars or bicyclists) we are still increase the amount distance travelled per unit time. I do apologize to Computer Scientists as this is a very rough approximation, but this way of thinking about the situation can help us understand how to use some trickery and slight of hand to make things "impossibly" faster/better.

Using these concepts, let's talk about a real problem: Suppose we have a web application that needs to do 15 seconds of computing in order to present a visual representation of some data to a user. We happen to know that the slowest portion takes 5 seconds and must be performed serially. furthermore, let's suppose that the we have a system, that based on a user's name, can compute the following:

  • Compute user's favorite number - 1 second
  • Compute user's favorite food - 2 seconds
  • Compute user's favorite pet - 3 seconds
  • Compute user's Birthday - 4 seconds
  • Compute user's favorite color - 5 seconds

Applying Amdahl's law, we can conclude that the fastest parallel implementation should take 5 seconds to finish computing (and require 3 processors). That is, if we started all 5 computations at the same time, we would not be fully complete until 5 seconds had passed. Optimally, with three processors: 1 processor could compute favorite color, one processor could compute Birthday + favorite number, and one processor could compute favorite food + favorite pet. This would shorted the entire transaction time to 5 seconds.

Thinking about the problem in terms of Gustavson's Law, however, we also discover that adding another 2 processors gives us an opportunity to do more work (and/or support more capacity). Put another way, suppose we also know we can also do the following:

  • Compute user's favorite song - 1 second
  • Compute user's shoe size - 2 seconds
  • Compute user's favorite flavor ice cream - 3 seconds
  • Compute user's favorite drink - 4 seconds

Knowing this, by simply adding two more processors, we can return almost double the data for our users. Furthermore, applying some web trickery, pipelining, and user experience magic, we can potentially fool the user into thinking that every operation takes 1 second. For example, we know that a user will typically only look at two data items at a time and, they will typically look at each screen for 2 seconds, we can display data is up to 15 seconds old, caching the data on the client is possible, and there is no strong preference for which data item they may want to look at first (yes, these are quite a few suppositions, but not unusual), we could do the following:

  • break the display into 5 sections (in order of display):
    1. favorite song + favorite number: Reachable after 1 second
    2. shoe size + favorite food: Reachable after 3 seconds
    3. favorite flavor ice cream + favorite pet: Reachable after 5 seconds
    4. Birthday + favorite drink: Reachable after 7 seconds
    5. favorite color: Reachable after 9 seconds
  • Add more processors and schedule the work as follows (there are other combinations that also work):
    • Processor 1: favorite number, then favorite food (number is done after 1 second, food after 3 seconds...processor is idle for 4 seconds)
    • Processor 2: favorite song, then shoe size (song is done after 1 second, shoe size after 3 seconds...processor is idle for 4 seconds)
    • Processor 3: favorite pet, then Birthday (pet is done after 3 seconds, birthday after 7 seconds...processor is fully utilized
    • Processor 4: favorite ice cream, then favorite drink (ice cream is done after 3 seconds, favorite drink after 7 seconds, processor is fully utilized)
    • Processor 5: favorite color (color is done after 5 seconds, processor is idle for 2 seconds)

By fiddling with the user experience, we've stretched our window of time and amount of compute resources available for the longer running serial operations. This means we could even optimize a bit further if we wanted by moving things around in the experience. What we've effectively done is given ourselves an opportunity to give the appearance that a 15 second operation actually takes no longer than 1 second. Note, this DOES require some compromise, but the general approach is one that can yield practical benefits even when traditional logic (and thinking purely from a sound theoretical perspective) might say it isn't possible.

Friday, June 12, 2015

The Programmers Code

  1. Prior to writing code, I will search the internet, I will ask intelligent questions, and I will realize that many of the answers I get may not be correct. I will test the answers and validate them before repeating it to someone else. I will not reinvent prior art without adding value
  2. Revision control tools are my staple, I will live and die by version control tools. Distributed, Centralized, Lock and Release, or Update and Commit...they all work and they are the foundation upon which I build everything. I will strive to learn how to use these tools to manage branches, tags, and how to properly share my code with my fellow programmers which whichever tool my team happens to use
  3. On all fronts I will respect the code that already exists and seek to understand why it is the way it is
  4. Giving credit when due, I will respect my peers, those who are less skilled, and those who are more skilled. I will not assume I'm smarter than everyone else, nor will I assume that esoteric and confusing work was the work of someone more intelligent than me. If I stumble across code that looks like a rabid squirrel ran across someone's keyboard I will seek out the author and understand the reason behind the incoherence
  5. Remembering that all programming languages and cultures are different, I will learn the idioms of whichever programming language I happen to be using. I will not code Ruby in Java, FORTRAN in Assembler, nor any other "language 1" in "language 2"
  6. At no point will I name classes, frameworks, files, or other artifacts for mythical creatures, movie characters, or other seemingly clever things. I will name things appropriate for the domain I'm working in (note, if you're writing a system working with the aforementioned have a special dispensation)
  7. My comments and aspirations will be obtainable. I will never insert comments like //TODO or //FIXME unless I personally take responsibility for these actions and have the ability to rectify them in the very near future
  8. Mannual testing is a minimum, I will think about how to test my code before I write it. If possible with my tools, I will write automated and repeatable tests before writing implementations. A compiler is NOT a testing tool
  9. Every line of code will be tested...every time I change it...if it's difficult see previous item
  10. Rrealizing that code is complex, I will use an IDE, vi and emacs are for gunslingers or people on vt100 terminals. If I don't know what a vt100 terminal is, I will only use vi when ssh-ing into a remote server...and then only when necessary.
  11. Sed, grep, awk, less, vi, regular expressions, bash are things I will know and understand. Even if I have no need in my job, I will know them and love them (even when I hate them)
  12. Creative code styles are forbidden, I will respect the format. Nobody wins whitespace and bracing arguments, whatever is currently in place is what I'll use. Leave these debates for internet trolls and ivory tower architects, they're the only ones who should care
  13. One language or framework is the not always the best for everything. I will refrain from trying to solve every problem with whatever my favorite tool happens to be. I will politely and strongly argue logical and rational points about merits and shortfalls of frameworks and languages. I will not wage Jihad against languages and frameworks I do not understand
  14. Daily builds and checkins are too infrequent. I will constantly commit code and modularize my changes to illustrate progress at all times
  15. Everything I write will be obsolete in four years. I will not cling to code, coding paradigms or system metaphors beyond their useful lifespan

Friday, June 5, 2015

Simple thoughts versus simple solutions

Often we are hampered because we think of a "simple solution" and it ends up being "simple to think about" but very complicated in practice. Something as simple as "Dig up the rock poking out of the front yard" seems really simple. All you need is a shovel and some ambition right? ...until you realize the rock is a 5 ton monster that, in fact, the utility company drilled a hole through to route your gas line.

Sometimes, as in this case, it might be better to do the more "complicated solution" i.e. go to the store, get some dirt, and cover the rock up, and plant new grass... because the "first/simplest" thing you thought about has some unknown complexities. Put another way..."simple to think about" doesn't equate to "simple to execute".