Monday, December 7, 2015

Lies, Damn Lies, and Virtualization

Having used virtualization in a variety of scenarios over the last 10-15 years, I still find some misconceptions about the value proposition and how to use it. At best these are just marketing misconceptions, but at worst they can lead to counterproductive activities that HURT your solution. So, not in any order, here three things I hear people say that are just normally not true. Yes I understand that there are some scenarios where they are, but for the most part in my experience these ideas have been taken too far and now create problems.

Virtualization helps me scale my solution

I hear this so much I really just get tired of reexplaining how this isn't true. While based on a grain of truth, in the general sense at the datacenter level, it's completely false. For a single application it is easier to add cores to a virtual machine than it is to buy new hardware...but...people forget that the hypervisor is running on real honest to goodness hardware and adding a new Virtual Machine on existing hardware actually REDUCES the capacity of the other virtual machines. Couple this with over provisioning and you could end up where 4 virtual cores giving you the capacity of a single core (or less). Worse yet, if you're just the "customer" you may need 4 virtual machines to get the capacity of a much smaller piece of hardware.

Hardware virtualization doesn't have any overhead

This is just silly, of course it does... Sure hardware is going to be much lower overhead than software, but scheduling Virtual machines on and off of shared CPU, Memory, Cache is overhead...again if over provisioning against underlying hardware is your "enterprise scaling strategy" be prepared for performance impact. Virtualization has overhead.

vMotions are undetectable and have no performance cost

I normally don't use profanity, but there's one word for this "Bullshit" (OK, maybe that's two words). For some reason, this lie propagates to the point (I blame vmware marketing for doing too good a job) that people have an almost religious belief in the "all holy magic" of Vmware;s ability to magically move state of a virtual machine with "zero impact on performance". Just stop believing, while it's transferring and maintaining state of a virtual machine from one piece of physical hardware to another...things slow down. End of story, don't believe me, drive a virtual machine to some nontrivial level (and measure the application performance...i.e. how LONG do my non trivial business transactions take?...don't forget that...many people do) and the motion it...if you're lucky it will only be a minute or two of "holy crap! what happened?"

Tuesday, December 1, 2015

Sideways customer service

After reviewing my credit card statement, I realized I was paying every month for efax (a service I used twice in 2006, one more time in 2013, and subsequently just missed on my billing statements every month for the last...yes...9 years...). The following is the somewhat surreal interaction I had when I cancelled...


 Greg Curtis: Thank you for providing the information. Please allow me a moment while I pull up your account. 
In the meantime, please type the number corresponding to your reason for cancellation:
1) Moving to another provider
2) Bought a fax machine
3) Business or role changed
4) Short term project completed
5) Financial reasons
6) Problems with faxing or billing
7) Dissatisfied with quality of service
8) Too costly
me adding my own answer
 Michael Mainguy: 9) only needed it once 2 years ago and forgot to cancel :) 
 Greg Curtis: As we'd like to keep your business, I can offer you a discount and also waive your subscription fee for 1 month. The discounted monthly fee would be $12.95 per month. This new plan includes 150 free inbound and 150 free outbound pages monthly. 
 Greg Curtis: There is no contract and you may cancel anytime. Shall I switch you to this 1 month free $12.95 monthly plan? 
 Michael Mainguy: No thank you. 
 Michael Mainguy: I literally need to fax once every 3 or 4 years 
 Greg Curtis: I completely understand your wish to discontinue. Conversely, I can offer you a discount and also waive your subscription fee for 2 months so that you can re-evaluate your needs. After the free period, pay just $12.95 per month. This new plan includes 150 free inbound and 150 free outbound pages monthly. 
 Greg Curtis: There is no contract and you may cancel anytime. Shall I switch you to this 2 months free $12.95 monthly plan? 
 Michael Mainguy: No thank you 
 Greg Curtis: OK, I will go ahead and cancel your account.
An email confirmation will be sent at your registered email address.

Is there anything else, I may assist you with? 
 Michael Mainguy: No, thank you very much 
 Greg Curtis: Thank you for contacting online Fax support. I hope you found our session helpful. Goodbye and take care. 
 Greg Curtis has disconnected. 

My evaluation...they do not understand the needs of an occasional user who would be willing to pay a premium to send 1 or two pages of faxes every couple of years (as only ancient banking institutions and outdated consulting organizations do this any more...but I digress).

Tuesday, September 22, 2015

The (slightly tongue in cheek) role of the database administrator

As a former DBA, I find a disturbing trend toward a value proposition that is almost nonexistent among a recent crop of database administrators. Maybe having some background and/or working with other stellar DBAs in the past has spoiled me, but here's the workflow I've find more and more common.

scenario - production application has slowed down for a few transaction types, dynatrace shows a critical sql statement has slowed down. None of the development team has access to run explains, we can't "afford" hardware to load the production dataset into another environment (because we're using a DBMS that costs 13.6 bajillion dollars per CPU nanosecond with an additional upcharge of 1 million pounds sterling every time we execute a query that uses DML... Explains indicate everything is optimal in the lower environments. The decision to use this particular platform after the salesman for the product took the DBA team to Vegas for a "conference" and after tough negotiations (forever documented by the excellent 'based on a real life story' movie: "The Hangover").

Step 1 (optional) DBA team notices the query is slow also...email from DBA team to application team "Dear application team, while eating donuts and sipping single malt whiskey this morning, I accidentally hit a button on this funny looking device sitting on my desk and the monitor in front of me popped up a report indicating this sql is pretty slow, thought I'd let you know. Please fix ASAP as I believe we're wearing our our disk spindles and they're VERY expensive. Also, do you know how to close the window with this report...it overlaid the Rugby World Cup streaming live in HD and I really want to see the 'All Blacks' win!"

Step 2 (optional step one) email from application team to DBA team "Dear DBA team, we've (also) noticed this query is very slow. In every other environment it runs fine (less than 5 milliseconds), and it uses the primary key on every table, we're unclear why it takes 10 minutes to complete production. Can you investigate?"

Step 3 email from DBA team to application team "Dear Application Team - as I mentioned before, this query is slow and my children's college fund is contingent on our database using less than 20 IOPS under peak load, fix this IMMEDIATELY! upon investigation I think if you removed that query the system run much better...I will note that your application is causing our database to use a lot more cpu and IO than when we initially powered the system up. Obviously you don't know how to write software as prior to your application coming online the 3 'hello world' applications we used to test the database platform didn't cause any problems like this. Just because you collect data from sensors around the globe and provide real time data to thousands of users simultaneously doesn't mean you can just slap crappy SQL in and expect it to run well. Please let us know if you need any assistance writing application software as we're clearly much more intelligent than you."

Step 4 followup email from DBA team to application team "Dear Application Team - we further noticed you're heavily using the database during our backup window from 1am to 8am eastern. It's critical the system not be used during this time window as we have a team of interns backing up the entire database on 3.25" floppies. Please tell your users not to use the system during this window or remove these SQL statements ASAP. Also, do you know a good torrent client? I really need to catch up on the last season of 'Game of Thrones' PS did you see the latest news on CNN, evidently a lot of users of your system are complaining about about performance during peak online shopping hours in Europe...Hope you figure out what you did wrong when writing your application software."

Step 5 application team changes system to use BDB running on an old android device connected to the internet via a 2g wifi hotspot....sets up elaborate VPN solution to enable application servers to use this database from production data center...problem goes away

Step 6 entire DBA team is promoted for "saving the company millions of dollars by optimizing key SQL queries

Evidently, this is the new normal. Apologies to any DBAs who might, in fact, have been more helpful or proactive in helping to solve the problem.

Tuesday, September 15, 2015

How to design a useful javascript framework

Based on my highly scientific analysis, there are currently 13.98 javascript frameworks per javascript developer. I've personally been on two projects where a framework was built, completely scrapped, and rewritten BEFORE THE PROJECT WAS DELIVERED! Based on this observation and the literal wasteland of half baked frameworks available, I'm sharing some insight on how to design a useful framework. Follow these rules and you'll have a higher probability of having something useful, ignore them, and well...I guess that's your choice, I won't hate you for it (OK, maybe I will a little).

Step One: Pick an existing framework

No, this is not a tongue-in-cheek joke, this is reality. Unless you initially demonstrated your framework at Bar Camp back in 2006, you should start with something that already exists and first deliver your project with that.

Step Two: Find things that the chosen framework doesn't do well

Now look at your product that is "code complete" and do an analysis of where your most common bugs happen, where new developers trip up and make mistakes, or where the code is repetitive. If nothing stands out...STOP, you're done. If there are rough edges, analyze approaches to the rough edges, and see how other EXISTING frameworks solve the problem. If nobody's solved it or if you think you've found a better way, refactor your code and enhance your chosen existing framework.

Step Three: After doing this for 4 to 5 years, and you've found better patterns or something novel, write a framework

Note, this step seems to be the one everyone skips (even authors of currently popular frameworks). Experience is important, attempting to write a framework after doing a "TODO" list app because you discovered something you don't like is a recipe for disaster. Moreover cross posting your new framework announcement across the internet to "make a name for yourself" is irritating and counterproductive.

Step Four: Write some useful applications using your framework

If you're starting a new javascript framework and haven't USED it, your chances of building something that is generally better than what already exists are vanishingly small and you're likely expending energy on something that, will not advance the state of the art by anything remarkable and is much more likely to be a step backward. I don't mean to discourage creativity, and still encourage folks to experiment and try things out...but temper your enthusiasm with reality until you can clearly illustrate how your new framework is better. In addition, be sure to look at your baby objectively...warts and all.

Thursday, September 10, 2015

Why I'm pushing your buttons

Interesting Story (to me)

I found myself in another interesting "ex post facto" software design review session and both sides of the table were getting increasingly frustrated. My frustration centered around my engineer's inability to explain "why" he did it that way or "how" it worked, and his frustration was a mystery to me. I suspect this has to do with the perception that perhaps I was telling him his baby was ugly.

I think what leads to this situation is the realization that I'm an almost negligent delegator. Yes, most who interact with me or know me professionally might not think it's true (as I DO like to get my hands dirty). I tend to give my tech leads and developers lots of rope which unfortunately means it will fit quite easily around all of our necks. This is, however, deliberate because I feel this historically has yielded the most innovative results and "generally" produces really good or really bad software. It averages out (IMHO) to "Better than average" software, and when taken with the notion that I then end up with a pool of REALLY good engineers that I can throw at fixing the "really difficult" stuff...I feel I usually end up with "above average" solutions (for long term engagements).

That having been said, there is a particular problem that this approach produces. That is, when someone takes an innovative or creative approach that isn't well thought through, we get to a situation where design guidance wasn't given early enough. Moreover, things that I could help forestall early on are then "too late to fix". In general I'm OK with this, yes it's frustrating to everyone involved, but frankly I have to be honest and say that it's deliberate. It's my approach to developing software engineers by allowing them to make and realize mistakes that has proven to work from a professional development as well as overall software quality perspective.

If you're ever in some sort of design review and I'm asking stupid or (better/worse yet) super challenging questions and questioning your every tiny detail... trust that I'm doing it not because I don't believe in your solution or don't believe your solution is "good enough" or don't believe you did a "good job", but that I want us all to do even better next time. I'm not being a dick or trying to be "Mr. Smarty Pants", I'm simply trying to help us both get better. When I say "I don't understand how this works" or "I don't think that's a good idea" I'm not saying you did a bad job, but truly just want a better understanding. I WILL say that if you've been doing this less than 20 or 30 years I may have some experience with problems you might not yet have run across and hope you will give me the benefit of having some reasonable doubt in your own capacity and experience. But I will also reserve judgement on your approach or it's quality until I have as complete an understanding as I can in the time available to me to review it.

In short, never assume I'm challenging your software because I think you (or your code) are inferior or not well thought out, but think of my challenges as an opportunity to challenge your own preconceptions and as an opportunity to grow yourself. At the end of the day, I hope to become smarter...but can only do that by accepting the notion that I might be wrong...you should do the same.

Tuesday, September 8, 2015

Real world Internet of Things

Having spent some times working with devices connected via GSM, I'd like to share some observations that seem to be obvious to cellular network engineers but get lost in the breathlessly overhyped echo chamber of marketing. The short assessment is that depending on the mobile nature of your connected device, the allowable delays, and the amount of data your intend to transmit and receive, you will need to very carefully choose your protocol and state management.

To begin, there are two major types of connected "things": #1 Mobile "things", such as trains, planes, and automobiles. and #2 Immobile "things" such as thermostats, refrigerators, and buildings.

Mobile Things

Mobile things have two unique problems you'll need to be concerned with, they are: #1 Speed, #2 Network availability

Speed (and/or velocity) impacts network connectivity because it introduces signaling problems for the radio network. Rapid movement or changes in direction can greatly impact packet loss and greatly reduce the effectiveness for protocols like TCP. If your device moves quickly and/or changes direction/speed often, you'll likely want a UDP or RTP based protocol to allow customization of how you deal with packet loss and delay. This also means that you're going to want to reduce the size of each message, but potentially increase the frequency and develop novel ways to handle losses. If you design your device to just use a TCP connection and delegate all this work to the networking stack (without serious TCP tuning) you're going to have a lot of problems most network engineers (outside wireless telecom) are just NOT used to dealing with. Be prepared for sleepless nights and sporadic failures if you use TCP.

Network availability introduces a similar problem as your device may enter and leave areas where is cannot communicate "at all". As above, your protocol needs to have definition around "what do we do when my device falls off the network for a few minutes/days/hours?". This is a big deal when using TCP based protocols because most connection based protocols account for this by waiting around to see if the connection can be reestablished, then using retry mechanisms to "guarantee" delivery.

A major failure folks tend to have with mobile things is that they tend to test their devices in unrealistic controlled environments (stationary in a lab) and rate their performance based on these criteria. When the devices begin operating in the real world, the factors above rear their ugly heads and the reliability of the overall system is severely impacted.

Stationary Things

Stationary things are arguably easier to deal with since, once connected, they don't have to deal with the problems mentioned above for mobile things. More importantly, they are generally tested in a manner similarly to how they are deployed...that is, stationary and with network connectivity. There is a problem that stationary things have that is generally much more problematic, which is shared with mobile things, but aggravated...namely:

"What do I do if I cannot get on the network?"

When designing a stationary thing, how can you handle a device that is deployed in a location with poor or nonexistent connectivity? Generally, a stationary device will need more connectivity options (3g, 4g, wifi, X.25, satellite, ethernet, bluetooth) as once someone has placed the device, it is unlikely that it will roam in and out of cellular connectivity (if it can't get a connection, it will never get a connection).

As more devices become connected, the importance of designing network protocols and tuning the stack to the device's attributes become more important in everyday life.

Monday, August 10, 2015

Please use ANSI-92 SQL Join Syntax

Way back in 1992, a standard was published for SQL. In it, the syntax for specifying joins in SQL Database was finally standardized. This was a good thing because (for folks who've been doing this a while), trying to decode the vagaries of how various Databases handled outer joins could drive you bonkers yielding different results depending on the parse order of things in the from and where clauses. Unfortunately, some DBMS vendors took a while to adopt this standard (Oracle didn't support it until 2001), but at this point if you're using a major RDBMS and NOT using it, you're probably mired in a tradition or habit that bears changing.

As an example of what I'm describing, here are some examples of what I mean.

Old School (no standard syntax + implementation and evaluation of predicates varied by vendor [and query]):

select * from customer, order
where customer.customer = order.customerid

ANSI way (agreed to standard, predicate evaluation has some rules that are followed):

select * from customer
  inner join order on customer.customerid = order.customerid

Now for the crusty folks who are going to argue "I've always done it this way and there's no quantifiable reason to switch" I give you a photo to illustrate my perspective

For folks who don't understand why it matters, I'll give a quick conceptual overview (that is slightly inaccurate, but serves to illustrate my point). In the above trivial example it IS a little arbitrary, but lets take a more involved example:

select * from customer, order, address, product
  where customer.customerid = order.customerid and customer.customerid = address.customerid and order.status = 'SHIPPED' and product.size = 'XL'

Can you spot the defect (this is still pretty trivial, but hopefully you've been burned by this a few times and will catch on. Compare this to:

select * from order
  inner join customer on order.customerid = customer.customerid
  inner join address on address.customerid = customer
  inner join product on <--wait what are we joining on here...cartesian product time...what are we actually joining on?
where order.status = 'SHIPPED' and product.color = 'WHITE' and product.size = 'XL'

While this is a quantitative evaluation and the readability of the second form is pretty subjective. If one makes the conceptual leap to say that "join" clause describes how to link the tables together and "where" clause describes how to filter the joined rows, the second form begins to make sense. The first form with 16 tables and a few out joins thrown in becomes an exercise in cognitive overload, where the second form (in more cases) is easier to mentally parse and reason about.

Monday, June 15, 2015

Amdahl's Law, Gustafson's Law, and user experience stretching to optimize parallel web design

There are two widely known laws in parallel computing that are counterpoints to each other, but with thought they can be applied together to achieve dramatic improvements in parallel computing. They two laws are Amdahl's Law and Gustafson's Law. Roughly speaking, Amdahl's says you cannot speed a system up by parallelization to be faster than the the slowest portion. Using a simple metaphor, if you have a car, a bicycle, and a turtle, when traveling in a pack, you can only move as fast as the turtle. The counterpoint to this is Gustafson's law, which roughly states that for parallel problems, adding more resources can increase the net work completed. Using are above metaphor, if we add more turtles (or cars or bicyclists) we are still increase the amount distance travelled per unit time. I do apologize to Computer Scientists as this is a very rough approximation, but this way of thinking about the situation can help us understand how to use some trickery and slight of hand to make things "impossibly" faster/better.

Using these concepts, let's talk about a real problem: Suppose we have a web application that needs to do 15 seconds of computing in order to present a visual representation of some data to a user. We happen to know that the slowest portion takes 5 seconds and must be performed serially. furthermore, let's suppose that the we have a system, that based on a user's name, can compute the following:

  • Compute user's favorite number - 1 second
  • Compute user's favorite food - 2 seconds
  • Compute user's favorite pet - 3 seconds
  • Compute user's Birthday - 4 seconds
  • Compute user's favorite color - 5 seconds

Applying Amdahl's law, we can conclude that the fastest parallel implementation should take 5 seconds to finish computing (and require 3 processors). That is, if we started all 5 computations at the same time, we would not be fully complete until 5 seconds had passed. Optimally, with three processors: 1 processor could compute favorite color, one processor could compute Birthday + favorite number, and one processor could compute favorite food + favorite pet. This would shorted the entire transaction time to 5 seconds.

Thinking about the problem in terms of Gustavson's Law, however, we also discover that adding another 2 processors gives us an opportunity to do more work (and/or support more capacity). Put another way, suppose we also know we can also do the following:

  • Compute user's favorite song - 1 second
  • Compute user's shoe size - 2 seconds
  • Compute user's favorite flavor ice cream - 3 seconds
  • Compute user's favorite drink - 4 seconds

Knowing this, by simply adding two more processors, we can return almost double the data for our users. Furthermore, applying some web trickery, pipelining, and user experience magic, we can potentially fool the user into thinking that every operation takes 1 second. For example, we know that a user will typically only look at two data items at a time and, they will typically look at each screen for 2 seconds, we can display data is up to 15 seconds old, caching the data on the client is possible, and there is no strong preference for which data item they may want to look at first (yes, these are quite a few suppositions, but not unusual), we could do the following:

  • break the display into 5 sections (in order of display):
    1. favorite song + favorite number: Reachable after 1 second
    2. shoe size + favorite food: Reachable after 3 seconds
    3. favorite flavor ice cream + favorite pet: Reachable after 5 seconds
    4. Birthday + favorite drink: Reachable after 7 seconds
    5. favorite color: Reachable after 9 seconds
  • Add more processors and schedule the work as follows (there are other combinations that also work):
    • Processor 1: favorite number, then favorite food (number is done after 1 second, food after 3 seconds...processor is idle for 4 seconds)
    • Processor 2: favorite song, then shoe size (song is done after 1 second, shoe size after 3 seconds...processor is idle for 4 seconds)
    • Processor 3: favorite pet, then Birthday (pet is done after 3 seconds, birthday after 7 seconds...processor is fully utilized
    • Processor 4: favorite ice cream, then favorite drink (ice cream is done after 3 seconds, favorite drink after 7 seconds, processor is fully utilized)
    • Processor 5: favorite color (color is done after 5 seconds, processor is idle for 2 seconds)

By fiddling with the user experience, we've stretched our window of time and amount of compute resources available for the longer running serial operations. This means we could even optimize a bit further if we wanted by moving things around in the experience. What we've effectively done is given ourselves an opportunity to give the appearance that a 15 second operation actually takes no longer than 1 second. Note, this DOES require some compromise, but the general approach is one that can yield practical benefits even when traditional logic (and thinking purely from a sound theoretical perspective) might say it isn't possible.

Friday, June 12, 2015

The Programmers Code

  1. Prior to writing code, I will search the internet, I will ask intelligent questions, and I will realize that many of the answers I get may not be correct. I will test the answers and validate them before repeating it to someone else. I will not reinvent prior art without adding value
  2. Revision control tools are my staple, I will live and die by version control tools. Distributed, Centralized, Lock and Release, or Update and Commit...they all work and they are the foundation upon which I build everything. I will strive to learn how to use these tools to manage branches, tags, and how to properly share my code with my fellow programmers which whichever tool my team happens to use
  3. On all fronts I will respect the code that already exists and seek to understand why it is the way it is
  4. Giving credit when due, I will respect my peers, those who are less skilled, and those who are more skilled. I will not assume I'm smarter than everyone else, nor will I assume that esoteric and confusing work was the work of someone more intelligent than me. If I stumble across code that looks like a rabid squirrel ran across someone's keyboard I will seek out the author and understand the reason behind the incoherence
  5. Remembering that all programming languages and cultures are different, I will learn the idioms of whichever programming language I happen to be using. I will not code Ruby in Java, FORTRAN in Assembler, nor any other "language 1" in "language 2"
  6. At no point will I name classes, frameworks, files, or other artifacts for mythical creatures, movie characters, or other seemingly clever things. I will name things appropriate for the domain I'm working in (note, if you're writing a system working with the aforementioned things...you have a special dispensation)
  7. My comments and aspirations will be obtainable. I will never insert comments like //TODO or //FIXME unless I personally take responsibility for these actions and have the ability to rectify them in the very near future
  8. Mannual testing is a minimum, I will think about how to test my code before I write it. If possible with my tools, I will write automated and repeatable tests before writing implementations. A compiler is NOT a testing tool
  9. Every line of code will be tested...every time I change it...if it's difficult see previous item
  10. Rrealizing that code is complex, I will use an IDE, vi and emacs are for gunslingers or people on vt100 terminals. If I don't know what a vt100 terminal is, I will only use vi when ssh-ing into a remote server...and then only when necessary.
  11. Sed, grep, awk, less, vi, regular expressions, bash are things I will know and understand. Even if I have no need in my job, I will know them and love them (even when I hate them)
  12. Creative code styles are forbidden, I will respect the format. Nobody wins whitespace and bracing arguments, whatever is currently in place is what I'll use. Leave these debates for internet trolls and ivory tower architects, they're the only ones who should care
  13. One language or framework is the not always the best for everything. I will refrain from trying to solve every problem with whatever my favorite tool happens to be. I will politely and strongly argue logical and rational points about merits and shortfalls of frameworks and languages. I will not wage Jihad against languages and frameworks I do not understand
  14. Daily builds and checkins are too infrequent. I will constantly commit code and modularize my changes to illustrate progress at all times
  15. Everything I write will be obsolete in four years. I will not cling to code, coding paradigms or system metaphors beyond their useful lifespan

Friday, June 5, 2015

Simple thoughts versus simple solutions

Often we are hampered because we think of a "simple solution" and it ends up being "simple to think about" but very complicated in practice. Something as simple as "Dig up the rock poking out of the front yard" seems really simple. All you need is a shovel and some ambition right? ...until you realize the rock is a 5 ton monster that, in fact, the utility company drilled a hole through to route your gas line.

Sometimes, as in this case, it might be better to do the more "complicated solution" i.e. go to the store, get some dirt, and cover the rock up, and plant new grass... because the "first/simplest" thing you thought about has some unknown complexities. Put another way..."simple to think about" doesn't equate to "simple to execute".

Tuesday, May 26, 2015

the myth of asynchronous JDBC

I keep seeing people (especially in the scala/typesafe world) posting about async jdbc libraries. STOP IT! Under the current APIs, async JDBC belongs in a realm with Unicorns, Tiger Squirrels, and 8' spiders. While you might be able to move the blocking operations and queue requests and keep your "main" worker threads from blocking, jdbc is synchronous. At some point, somewhere, there's going to be a thread blocking waiting for a response.

It's frustrating to see so many folks hyping this and muddying the waters. Unless you write your own client for a dbms and have a dbms that can multiplex calls over a single connection (or using some other strategy to enable this capability) db access is going to block. It's not impossible to make the calls completely async, but nobody's built it yet. Yes, I know ajdbc is taking a stab at this capability, but even IT uses a thread pool for the blocking calls (be default).

Someday we'll have async database access (it's not impossible...well it IS with the current JDBC specification), but no general purpose RDBMS has this right now. The primary problems with the hype/misdirection are that #1 inexperienced programmers don't understand that they've just moved the problem and will use the APIs and wonder why the system is so slow (oh I have 1000 db calls queued up waiting for my single db thread to process the work) and #2 It belies a serious misunderstanding of the difference between async JDBC (not possible per current spec) and async db access (totally possible/doable, but rare in the wild).

Installing Varnish on Cento6

I recently had a need to install varnish on Centos. While it's very simple, a key step missing from the official instructions is to install the epel-release. Not sure why yum can resolve the dependency on Centos, but the following steps worked for me:

sudo yum install epel-release
sudo rpm --nosignature -i https://repo.varnish-cache.org/redhat/varnish-4.0.el6.rpm
sudo yum install varnish

Saturday, April 25, 2015

Refactoring 101 for complete beginners

Working in the field for a while, I find that new folks in the field have a problem with refactoring. I think the primary problems are:

  • Code without tests is dangerous and frankly scary to refactor
  • It's really easy to copy/paste code, and if you don't have to support and bug fix things, the merits of clean and concise code are lost
While I don't claim to fix anything, here's a quick guide on how/why to refactor (for complete newbies):

Wednesday, April 22, 2015

The Hazard of Not Taking Things Personally at Work

Don't take it personally is an oft repeated platitude I hear repeated in the work environment. While I think it's unhealthy to take problems at work and things out of your control as personal affronts, I think a healthy dose of taking personal ownership of your work is "a good thing" (tm). I say this because the counterpoint to not taking things personally is not giving a damn and I think this is a far worse situation than folks who are personally invested and passionate about their work.

In my experience, the most successful folks I've interacted with take a very deep personal interest in their life's work. They are passionate, appropriately loyal, and care a great deal about the quality of the product or service they provide. Folks that punch the clock, point fingers, and skip home after their "8 (or so) hours of physical presence" are huge problem and all too often a key source of low quality work and cumbersome process. More dangerously this leads to a culture of "not my job" and causes folks who might otherwise be outstanding performers to ask themselves "why should I put forth any extra effort?" or "what's in it for me?".

So I amend the platitude and say this: Take personal ownership of your work, while I'm not about to spout phrases like "Arbeit Macht Frei", I DO think it is better if one approaches their work with a sense of ownership, responsibility, and a certain amount of pride in what they do for a living.

Thursday, March 19, 2015

Installing virtualbox guest additions on Centos 7 minimal image

I've just spent some time setting up a bare Centos 7 image to support development stripping out as much as possible. While CoreOS is probably a better choice, we run redhat in production at this point and centos is a better fit for the time being. The problem I've found is that many of the instructions available via google were written against prerelease versions or espouse manually installing random rpms instead of just using yum. While I get this "works", I'm not a huge fan of this approach and would rather do everything with the package manager. After a LOT of scouring and trial and error, I finally found the "magic" combination.

The winner is courtesy of http://www.pc-freak.net/. I've taken these instructions and tweaked them slightly for my purposes.

The original poster did this on a more "full fledged" version of centos with a windows host and my instructions are for a minimal install using and OSX host (though I'm certain the OP's instructions will work too. First, mount the guest additions cd from the virtualbox gui (original link has a photo, but it's under devices->Insert Guest Additions CD Image. Then log into the console as root. Once you've done this, enter the following commands:

yum -y update kernel*
reboot -r now

After this step, you'll have to log in again. Then run the following:

yum install -y gcc kernel-devel kernel-headers dkms make bzip2 perl
export KERN_DIR=/usr/src/kernels/`uname -r`
mkdir /media/cdrom
mount /dev/cdrom /media/cdrom
cd /media/cdrom
./VBoxLinuxAdditions.run

Additionally, there are some instructions that I believe will work, but installed more than I really wanted to or I didn't discover until after. Additionally (pun intended) a key point I missed originally is that the minimal iso is missing default packages (perl I think was the culprit) that you need to properly build the guest additions so instructions for the "full blown" image will fail for mysterious reasons (i.e. the "build" symlink is broken)

  • This seems like it would work too, but installs more than I really wanted to over my cell phone connection
  • I found this after I wrote my post and it is virtually (pun intended) identical to my instructions.

Examples I tried that did NOT work or were too convoluted:

  • centos.org forum had all sorts of crazy "wget" hackery suggested...While it may work, I like the epel approach better
  • this version on stackexchange CAN get you there, but you have to read all the comments to figure out what is really required for a minimal install

I realize my post is a bit of a duplicate, but as there's no good way to get rid of historical anachronisms on the internet, this is an attempt to boost the ranking of the better approaches

Monday, March 16, 2015

My (potentially bad) parenting advice

Parents of the world, I have one piece of advice that will give you a tool to help your child become happy, healthy, and productive. Give them a shovel and tell them they can do anything they want with it. Note, if you are in an urban area, this might be BAD advice, so suburban and rural people listen on, urban folks find a friend in the burbs with a yard and then follow along.

A kid with a shovel is an amazing thing to watch. Kids who are impossible to pry from the WII/PS3.. who don't like soccer/football/whateever, who might otherwise be surly or withdrawn... will become captivated by the idea that they can can explore and possibly find buried treasure, fossils, rocks, and dig/play for hours. Add water to the mix and the possibilities are endless: sand castles, mud castles, mud pies, mud pits, waterfalls, ponds, you name it!

Too often in our modern world, we think of parenting as an activity that requires structure, supervision, and direction. I think excessive amounts of structure, supervision, and direction could actually be a bad thing, they remove creativity and adventure.

Note: to my neighbors, be careful walking through my back yard, there are some holes back there and you could break a leg note #2: follow this advice at your own risk...some kids shouldn't be trusted with sharp implements...

Friday, January 16, 2015

Accuracy Versus Precision

In a recent elevator conversation with the team, we stumbled on a side conversation about the difference between accuracy and precision. I've always used them defined this way:

  • Accuracy - The nearness of a value to the "real" value
  • Precision - The resolution of a measurement

It turns out this is not entirely correct and these definitions, while technically accurate in certain fields, are not universally held to be true. In fact, the above definition of precision is almost completely wrong for most other engineering and scientific disciplines. A more accurate (see what I did there?) set of definitions would be something like:

  • Accuracy - The nearness of a value to the "real" value
  • Precision - The probability of repeated measurement yielding the same result
  • Measurement Resolution - The resolution of a measurement
Source: Wikipedia

Thursday, January 15, 2015

Trail of tears architecture anti-pattern

I'm currently struggling with another project suffering from what I dub the "Trail of Tears" architecture pattern. This is a situation where the architecture is littered with the remains of legacy code that never seems to get removed. This leads to mounting maintenance costs, and a fragile architecture that becomes increasingly difficult to manage. It also has the side effect of creating a "broken window" problem where there is no clear "correct" way to do things and the necessary rigor around adhering to standards and best practices (I HATE that term...but oh well) rapidly falls apart.

Historically, the only way I've seen to combat this is to rigorously support opportunistic refactoring other wise known as following the "Boy Scout Rule". While this has it's own problems (folks breaking seemingly unrelated things trying to clean things up and inflated scope for features being too key ones) it has proven to be the only way to combat this problem.

The rub in solving this problem is that it necessarily forces a tempo drop when changing directions as we must now account for refactoring "things that work" so that they meet the new standards and patterns as they evolve. This is unfortunate, but trust me...it's absolutely necessary if you want a maintainable and healthy system running for years to come. My advice...don't walk the trail of tears.

Monday, January 5, 2015

Minecraft is the new Doom

I just read Coelacanth: Lessons from Doom and I just realized that Minecraft is the new Doom. If you don't believe me, find a teenager that has a computer (or smartphone, or tablet) that hasn't played Minecraft.

Doom revolutionized gaming, not by making the FPS technologically possible (thought this is a big deal), but because it spawned a multitude of user generated mods and made it relatively easy and open to do this. This, in turn, incited myriads of hacker youth to build their own mods, edit levels, even (in my case) buy books on graphics programming and learn about BSPs, GPUs, ASM coding, optimizing C code, and other esoteria that I would have ignored.

Frankly, the almost the entire gaming industry owes a debt of gratitude for inspiring the current flock (actually, we're probably on the second or third generation now) of hackers who saw success building their own games with toolkits provided by the makers of Doom. This seems to have been largely ignored up until quite recently by the business side of the gaming industry until quite recently. Some games, such as Mod Nation Racers, the Tony Hawk series, and Little Big Planet, have tapped into the idea of enabling level editing (it's pretty standard fare at this point), but there's still a gap between that and truly modding the game. In these games you generally can't affect fundamental parts of the game and the editors are, in fact, quite oversimplified (not entirely a bad thing BTW).

Minecraft and Doom, on the other hand, are built to enable hackers to expand what is possible within the game via Mods. This enables players who are aspiring developers to actually enhance the capabilities of the game beyond what the original developers had imagined. The idea of making things modifiable isn't new, but actually endorsing a tinker/hacker culture and finding ways to still make money on the product is commendable.