Thursday, December 4, 2014

Why software estimates change and inflate

As a software developer (Architect?) I find myself in a constant battle with clients and project managers about changing estimates and their inaccuracy. I've lost count the number number of times I've given an estimate and then had to revise it to the dismay of folks who assumed seemingly subtle changes would move the estimate down or allow it to remain the same, only to see it creep up. There are a number of variables that increase the estimation risk and I'll briefly touch on the major factors.

Major factors are:

  • Changing requirements/assumptions - a fundamental truth is that any change incurs overhead. The act of changing the design to remove a requirement is is work too...remember, even simplifying the design is work. Removing a requirement mandates revalidating a design against (even if they're simplified) the new requirements.
  • Changing the team structure - an experienced dev is much more effective than a newbie. Moreover a person well versed in a particular solution is often more effective than a more experienced resource who is unfamiliar with the existing code. Creating estimates for an unknown team is tremendously difficult and often leads to large inflations to account for the risk of getting an unvetted resource.
  • Work fills to accomodate available time - if you give a developer 40 hours to complete a task, they will almost always take AT LEAST that amount of time. Even if it seems to be a simple task, they will spend extra time to analyze options, test, and otherwise use available time even if they COULD have potentially "just done the work" in four hours.
  • Estimates are just starting points - the harsh reality is that estimates for non-trivial software are starting points and evolve as more information becomes available. The more analysis you do without obtaining more information, the higher the multiplier is that the estimate is based on faulty information (especially when it involves the factors mentioned above

The short version is that "all software development is design". Any change anywhere changes that design and thus creates more work. Agile proponents realize this (maybe implicitly) and combat this problem by locking down design for periods of time to help move things forward (with real deliverables). Long drawn out design cycles cause extra work that too often is underwater.

Tuesday, November 25, 2014

Easily changing java versions in OSX

In OSX, they've frankly done a pretty good job of enabling multiple versions of java at the same time, but just as frankly it's somewhat obscure and inconvenient to manage multiple versions. It's not mind bogglingly difficult but for us oldsters who are lazy, I created a convenient way to switch versions inspired by (though nowhere nearly as awesome as) rvm for ruby.

  1. Download all of the versions of java you want to use from the appropriate location java 1.6, java 7, or java 8. (you need at least one to start with)
  2. Add the following lines to ~/.bash_profile
  3. jvm() {
     export JAVA_HOME=`/usr/libexec/java_home -v "$1"`
     java -version
  4. Either source .bash_profile by typing ". ~/.bash_profile" or simply close your terminal and relaunch

At this point you can change versions of java by typing:

jvm 1.6*
jvm 1.7*

Yes, there's more to it, refer to java_home for more version matching options, and it could be way more awesome, but this should be a quick help for those who just need a simple way to switch when troubleshooting/testing jvm version issues and you want to quickly change JDKs in an automated fashion. Note, this also works with fix pack and minor versions, you just need to refer to the version pattern matching of the '-v' option for java_home to know how to use it.

edit - I originally had an alias pointing to a function until a gracious commenter pointedly asked why I did it that way. Not having an answer I eliminated the alias. This shows the strength of my convictions about the "right way" to do things...

Thursday, September 25, 2014

Why OSX sucks and you should use ubuntu instead

OK, I confess, I use OSX almost exclusively and have for a number of years now. I DO have a number of Ubuntu machines, VMs, and servers in stable, but my goto device is a macbook pro (actually two of them, one for work, one for fun). I love the hardware, but the OS and specifically it's lack of a software package management tool has just a level of suckyness that irritates me.

Now, don't get me wrong, OSX suckyness is nothing compared to windows, but it seems to be frozen in 2004 and is not moving forward at a pace I think is acceptable considering the huge advances Ubuntu has made in a very short time frame. In the same vein, the UI for OSX is awesomely polished and user friendly, but there are some major pain points I can't seem to get past.

My Points

Ubuntu, being a Debian variant has an awesome software package management system. More importantly, just about anything you could ever want is ALREADY THERE in some shape or form. OSX has homebrew and macports...which both suck and are just plain confusing. Why in the world there is a need to do a recompile on a platform as tightly controlled as OSX when Ubuntu can deploy binary packages is a complete mystery to me.

This having been said

Apple is a hardware and user experience company, not a software company. Your hardware awesomely rawks, keep it up. Your software is pretty darn good, but you need partner with canonical and/or an open source company to get a decent package management solution (or just fork Debian...or just partner with canonical). Your development tools are horrific. Please contact a professional developer who also does open source, not a sycophantic Apple Fanboi to help fix the problem.

Monday, August 18, 2014

It's not NoSQL versus RDBMS, it's ACID + foreign keys versus eventual consistency

The Background

Coming from a diverse background and having dealt with a number of distributed systems, I routinely find myself in a situation where I need to explain why foreign keys managed by an acid compliant RDBMS (no matter how expensive or awesome), lead to a scaleability problem that can be extremely cost prohibitive to solve. I also want to clarify an important point before I begin, scaleability doesn't equate to a binary yes or no answer, scaleability should always be expressed as an cost per unit of scale and I'll illustrate why.

Let's use a simplified model of a common web architecture.

In this model, work is divided between application servers (computation) and database servers (storage). If we assume that a foreign key requires validation at the storage level, no matter how scaleable our application layer is, we're going to run into a storage scaling problem. Note: Oracle RAC is this the end of the day, no matter how many RAC nodes you add, you're generally only scaling computation power, not storage.

To circumvent this problem, the logical step is to also distribute the storage. In this case, the model changes slightly and it begins to look something like this.

In this model, one used by distributed database solutions, (including high end acid compliant databases such as Oracle RAC or Exadata or IBM purescale), a information storage is distributed among nodes responsible for storage and the nodes don't share a disk. In the database scaling community, this is a "shared nothing" architecture. To illustrate this a little further, the way most distributed database work in a shared nothing architecture is one of two ways, for each piece of data they either:

  • Hash the key and use that hash to lookup the node with the data
  • Use master nodes to maintain the node to data association

So, problem solved right? In theory, especially if I'm using a very fast/efficient hashing method, this should scale very well by simply adding more nodes at the appropriate layer.

The Problem

The problem has to do with foreign keys, ACID compliance, and the overhead they incur. Ironically, this overhead actually has a potentially serious negative impact on scaleability. Moreover, our reliance on this model and it's level abstraction, often blinds us to bottlenecks and leads to mysterious phantom slowdowns and inconsistent performance.

Let's first recap a couple of things (a more detailed background can be found here for those that care to read further.

  • A foreign key is a relation in one table to a key in another table the MUST exist for an update or insert to be successful (it's a little more complicated than that, but we'll keep it simple)
  • ACID compliance refers to a set of rules about what a transaction means, but in our context, it means that for update A, I must look up information B

Here's the rub, even with a perfectly partitioned shared nothing architecture, if we need to maintain ACID compliance with foreign keys, we run into a particular problem. If the Key for update A is on one node, and the Key for update B is on a different node... we require a lookup across nodes of the cluster. The only way to avoid this problem... is to drop the foreign key and/or relax your ACID compliance. It's true that perfect forward knowledge might allow us to design the data storage in such a way that this is not really a problem, but reality is otherwise.

So, at the end of the day, when folks are throwing their hats into the ring about how NoSQL is better than RDBMS, they're really saying they want to use databases that are either:

  • ACID compliant and they'll eschew foreign keys
  • Not ACID compliant

And I think we can see that, from a scaleability perspective, there are very good reasons to do this.

Friday, August 15, 2014

Things to remember about information security

As more businesses look to cloud application providers for solution, the need for developers to understand secure coding practices is becoming much more important. Gone are the days when a developer would write an application that only ran in a secure environment and now it is possible for applications to be moved to locations where previously well managed security gaps now are exposed to the internet at large. Developers now more than ever need to understand basic security principles and follow practices to keep their applications and data safe from attackers.

To make things more secure, a developer needs to first understand and believe the following statements:

  • You don't know how to do it properly
  • Nothing is completely secure
  • Obscurity doesn't equal security
  • Security is a continuum

You don't know how to do it properly

If I had a nickel for every developer who though they invented the newest, greatest, cleverest encryption/hashing routine, I'd be a millionaire. Trust me, if you aren't working for the NSA or doing a doctorate on the subject, there are thousands of people who can defeat your clever approach...worse yet, even if you ARE in the aforementioned groups there are still SOME folks who can defeat your approach. Which means:

Nothing is completely secure

The only way to completely secure a system or data is to completely destroy it. This is a mathematical fact, don't argue, just trust me on this. If ONE person can access the information, someone else can. MAYBE if it's in your head and your head alone it is pretty secure, but there are ways of getting that information too...some of which can be unpleasant. So these two things having been said, I want to add the clarifying statement that:

Obscurity doesn't equal security

As someone who has witnessed back doors get exploited numerous times, thinking you can just "hide the key under the rock" and hope for the best is not a sound policy. Don't get me wrong, making targets less obvious is great... please do it... but be wary of relying on this as your sole security measure, it will be discovered. Which leads to my final point:

Security is a continuum

Remember how security isn't absolute? Well this is the reassertion of that statement. When having discussions, the question isn't "is it secure (yes/no)?" it should be "is it secure enough (yes/no)?" and "what are our threat vectors?". Subtly changing the question from being absolutely yes or no can open up a discussion and let you objectively begin to measure your risk.

Monday, August 11, 2014

Avoid hibernate anemia and reduce code bloat

One of my beefs with Hibernate as an ORM is that it encourages anemic domain models that have no operations and are simply data structures. This coupled with java's verbosity tend to make code unmaintainable (when used by third party systems) as well as cause developers to focus in THINGS instead of ACTIONS. For example take the following class that represents a way to illustrate part of a flight booking at an airline:

public class Flight {
    public Date start;
    public Date finish;
    public long getDuration() {
        return finish.getTime() - start.getTime();

This is the core "business" requirement for a use case in this model in terse java. Form an OO perspective, start and finish are attributes, and getDuration is an operation (that we happen to believe is mathematically derived from the first two fields. Of course, due to training and years of "best practices" brainwashing, most folks will immediately and mindlessly follow the java bean convention making all the member variables private and "just generate" the getters and setters. That makes the same functional unit above look like the following:

public class Flight {
    private Date start;

    public Date getStart() {
        return start;

    public void setStart(Date start) {
        this.start = start;

    private Date finish;

    public Date getFinish() {
        return finish;

    public void setFinish(Date finish) {
        this.finish = finish;

    public long getDuration() {
        return finish.getTime() - start.getTime();

Wait, we're not done yet, if we want duration to be persisted, we'll move the logic to another class and add getters and setters:

public class Flight {
    public Date start;

    public Date getStart() {
        return start;

    public void setStart(Date start) {
        this.start = start;

    public Date getFinish() {
        return finish;

    public void setFinish(Date finish) {
        this.finish = finish;

    public Date finish;

    private long duration;

    public long getDuration() {
        return this.duration;

    public void setDuration(long input) {
        this.duration = input;

public class FlightHelper {
    public static long getDuration(long finish, long start) {
        return finish - start;


This "Helper" or "Business Delegate" pattern is yet another area where things go wonky very quickly. Usually, to keep things "pure" folks will put all logic in the helper (or delegate, I'm not sure if there's a difference) and the model will have no logic. This really makes troubleshooting where the logic is contained very difficult. In addition, having a computed and stored field is fraught with potential for errors. Java folks will typically make the case that this class is really a Data Transfer Object (DTO)... OK, fine, but that's like saying an elephant is actually an herbivore...

But gets worse...

What I often see happen among java circles is that this is a death spiral of bloat in the interest of "best practices". A typical next step is that, folks invariably realize that serializing hibernate objects to remote servers or tiers that don't have access to hibernate becomes a huge challenge due to hibernate's technique of using AOP to actually replace the real object with a dynamic proxy. To get around this, developers invariably create another layer of DTOs or "Value Objects" as well a mapping layer to map between these two domains.

In conversation with most java developers about "why are we doing it this way?" I get blank stares and the best answer I've heard is "because that's the way we do it" or often a link to a web site explaining how to do it and why which ultimately is really just a clever way of saying "I don't know". Crafty individuals will then start talking about java patterns and all sorts of other artificial explanations that never explain "why", but simply re-endorse "how".

A way to mitigate this problem is to start decomposing application components functionally and realize that data persistence is in fact a first order operation in most systems. This means that persisting data should be atomic and a single step operations (hint: If you need a transaction manager the call is NOT atomic). Additionally, putting these behind web services means that the idea of persisting data becomes an internal responsibility and not something a caller needs to know or care about

Put another way, hide our persistance layer behind an API and don't create superfluous classes that need to be shared with third parties. So, in the example above, you could do something like:

public class FlightService {
    public Date getStart(long id) {
      //...implementation here...
    //create a flight and return the identifier
    public long createFlight(Date start, Date finish) {
    ///Returns duration
    public long setStartAndFinish(long id, Date start, Date finish) {
      //..implementation here...
    public Date getFinish(long id) {
      //...implementation here...
    public long getDuration(long id) {
      ///...implementation here ...

This preserves the idiomatic java, plus enables us to completely hide the implementation details from the caller. Yes, it introduces a transaction and granularity problem that we immediately need to solve... and should force us (unless we really want to do it the hard way) to start thinking about he API contract for atomic operations. I think this is the important distinction and shouldn't be forgotten. Worry about what your design is supposed to DO first as at the end of the day, the OPERATION is more important the the MODEL.

Monday, June 23, 2014

Success, Failure, and Tradition

A quick post about my reality:

Success is attainable through failure, and tradition is a crutch.

I'm amazed at how many people think that they can envision some favorable outcome and attain it on the first try, flawlessly, with no course adjustments. This is as silly as imagining that a child can learn to walk simply by possessing the desire and some careful instruction. From my observation, anything even moderately complex and less automatic than breathing requires trial and error and the most important skill to learn is how to accept failure with open eyes and learn from it.

In fact, I'd suggest that a critical part of learning something new is attaining an understanding what failures led to WHY it's done a certain way and not one of a million other ways. Sometimes the answer is buried in history and in fact, there might not even be a great reason for doing it that way anymore. One warning sign I use as my gauge on when this may have happened is when discussing an issue and someone (perhaps even myself) cannot answer 5 whys. In short, precedents and tradition are convenient ways to avoid failures others have already learned from, but can also lead to the pitfall of "doing it the way we've always done it".

Being afraid of trying something new because it hasn't been done or "everyone knows" it isn't done that way and it "might fail" is not a way to be successful. To truly succeed, abolish fear of failure and replace it with fear of not learning. Additionally, take a look at traditions and understand why they exist and what they are really teaching.

Friday, June 20, 2014

We are not our code

For many people (myself included), creating software is difficult, rewarding, and enjoyable. There is a feeling of pleasure in the act of creating something that is ultimately useful for someone... either for commerce, pleasure, or even the mundane (writing software to keep a deicer working yeah, that's a pun). It's important to keep things in the proper context though, you are not your code, I am not mine, we are not our code, to be effective, we must decouple our ego from our code and embrace Egoless Programming.

Software should not be an extension of ourselves because software is much more transient than even we are, it's supposed to be, that's what makes it cool. Technology changes, the state of the art moves forward, patterns evolve, die, reemerge in a new form. In a way, deconstructing what's been done before and reforming it into something new is where the value of solid software development comes from, not from building the "ultimate answer machine".

The problem is, many people get so intimately involved with their software, their baby, that they cannot decouple their ego from their code, it becomes an extension of themselves. Moreover, they are not able to accept criticism of their baby, their baby isn't ugly!. To be truly effective in technology, you must accept your baby might possibly be ugly and be able to love and embrace that ugliness.

The good news is that really good technologists love ugly and beautiful babies equally. Crappy software is just as difficult to create... heck maybe even MORE difficult, but that doesn't mean it isn't valuable. Moreover, just like people, code will age: it will get wrinkles, warts, it will break, become fragile, and eventually need to be retired to that from which it came. That's OK, it's the way things should be and it's reality. Failing to acknowledge that reality will hamper your progress in creating new things because there will be no more room for new things, you'll be stuck always thinking about the old things and clinging to the prison of "the ultimate answer machine" you think you created 20 years ago.

Take the red pill

Wednesday, June 18, 2014

Through The Looking Glass Architecture Antipattern

An anti-pattern is a commonly recurring software design pattern that is ineffective or counterproductive. One architectural anti-pattern I've seen a number of times is something I'll call the "Through the Looking Glass" pattern. It's named so because APIs and internal representations are often reflected and defined by their consumers.

The core of this pattern is that the software components are split in a manner that couples multiple components in an inappropriate manner. An example is software that is split between a "front end" and a "back end", but they both use the same interfaces and/or value objects to do their work. This causes a situation where almost every back end change requires a front-end change and visa versa. As particularly painful examples, think of using GWT (in general) or trying to use SOAP APIs through javascript.

There are a number of other ways this pattern can show up... most often it can be a team structure problem. Folks will be split by their technical expertise, but then impede progress because the DBA's who are required to make database changes are not aware of what the front end application needs so the front end developers end up spend a large amount of time "remote controlling" the database team. It can also be a situation where there is a desire for a set of web service APIs are exposed for front end applications, but because the back-end service calls are only in existence for the front end application, there end up being a number of chicken and egg situations.

In almost every case, the solution to this problem is to either #1 add a translation layer, #2 #1 simplify the design, or #3 restructure the work such that it can be isolated on a component by component basis. In the web service example above, it's probably more appropriate to use direct integration for the front end and expose web services AFTER the front end work has been done. For the database problem, the DBA probably should be embedded with the front-end developer and "help" with screen design so that they have a more complete understanding of what is going on. An alternate solution to the database problem might be to allow the front end developer to build the initial database (if they have the expertise) and allow the DBA to tune the design afterwords.

I find that it's most often easiest and most economical to add a translation layer between layers than trying to unify the design unless the solution space can be clearly limited in scope and scale. I say this because modern rapid development frameworks ([g]rails, play...) now support this in a very simple manner and there is no great reason to NOT do it...except maybe ignorance.

Tuesday, June 10, 2014

Women in technology

When I stop and look around at work, I notice something that bothers me. Why is the ratio of women to men so low in technology? I've read quite a few posts from women who have lamented the locker room mentality of many technology shops and having been guilty of similar (often cringeworthy) behavior, I can certainly understand why this could be a significant detractor...but it seems like this can't be the sole reason. Only being intimately familiar with the North American technology workplace, I also wonder if this is true globally. I seem to recall on recent visit to a Chinese shop there being a higher number of women in the workplace.

I recall a conversation with my daughter from last night that immediately revealed my own bias. Her car was having problems with the cooling system and I asked her if she had fixed it yet (my son and I both took a look and just shrugged). After her reply in the affirmative, I asked if she took it to a mechanic...and she stated "No, I fixed it myself". Shame on me for first asking if she asked her boyfriend how to fix it (please, feminists, no bricks through my window) and shame on me again for staring at her slackjawed at her response to the negative. Frankly I was amazed, confused, proud, and also quite a bit troubled by my own response.

As she went on to explain how she texted all her male friends and realized that #1 the ones that knew anything about cars were unresponsive and #2 the rest of them were utterly clueless about repairing vehicles. After this realization, she stated she went out, popped the hood, found the problem (a cooling system hose was leaking), removed it, replaced it, and moved on sans male intervention or assistance. I would have completely expected this from one of my sons, but even now, I'm positively glowing with pride at her self sufficiency. Why?

To give context, the woman in question is 20 and just graduated from a four year institution with her Bachelor's degree last month, so she's smart. The same person held down a job for the almost the entirety of her college career, and at one point actually had two jobs (with a full class load). In high school, she was a middle distance runner and also played in the band, which imposed a brutal schedule that often had her busy from 6am until past 6pm... after which should would then do her homework. She's fastidious, watching her do homework and research was almost painful for me because ... well, I'm not an awesome student and just never really understood how she could do it. Moreover, she just signed up for another class over the summer to get another endorsement for her teaching certificate (her quote "Yay, now I can take college classes for FUN").

All that having been said it seems like I should have been completely nonplussed that she went and fixed what she ultimately stated was an obvious and easy problem with her vehicle. I like to think of myself as an enlightened male of the modern age, but there is a significant sexist bias in my thought that I need to work on. Moreover, I think this bias pervades our culture so much that we're shortchanging ourselves. Having worked with a number of rockstar quality technologists who happened to be female, I wonder how many there might be that are shying away for whatever reason. Having to constantly find new talent and knowing how difficult it is to find quality technologists, I lament the fact that we seem to be underutilizing 50% of the potential talent. Being the father of two daughters transitioning into adulthood, now more than ever I'm going to challenge myself personally to make sure I keep any unconscious bias at bay. Please do the same...

Tuesday, May 27, 2014

Testing Love and Polyamorous TDD

The rigor and quality of testing in the current software development world leaves a lot to be desired. Additionally, it feels like we are living in the dark ages with mysterious edicts about the "right" way to test being delivered by an anointed few vocal prophets with little or no effort being given to education of the general populace about why it is "right", instead spending effort evangelizing. I use the religious metaphor because to me it seems a very large amount of the rhetoric is intended to sway people to follow a particular set of ceremonies without doing a good job of explaining the underpinnings and why these ceremonies have value. I read with interest an post by David Heinmeier Hansson titled TDD is dead. Long live testing that pretty much sums up my opinion of the current state of affairs in this regard. A number of zealots proclaiming TDD to be the "one true way", but not a lot of evidence that this is actually true.

Yes, Test Driven Development (TDD) is a good practice, but it is NOT necessarily superior to: integration testing, penetration testing, operational readiness testing, disaster recovery testing, and any of a large number of other validation activities that should be a part of a software delivery practice. Embracing and developing a passion for all manner of testing are important parts of being a well rounded, enlightened, and effective software developer. Since I have this perspective, I'm particularly jostled by the perspective outlined by Bob Martin's treatise on Monogamous TDD is the one true way. In direct reaction to this post, I propose we start to look at software validation as an entire spectrum of practices that we'll just call Polyamorous TDD. The core tenets of this approach are that openness, communication, the value of people, and defining quality are more important than rigorous adherence to specific practices. Furthermore, we should promote the idea that the best way to do things often depends on what particular group of people are doing them (note, Agile folks, does this sound familiar?)

I chose the term Polyamory instead of Polygamy or Monogamy for the following reasons:

  1. It implies there are multiple "correct" ways to test your code, but you are not necessarily married to any one, or even a specific group of them
  2. It further suggests that testing is about openness and loving your code instead of adhering to some sort of contract
  3. On a more subtle level, it reenforces the notion that acceptance, openness, and communication are valued over strict adherence to a particular practice or set of practices.

All this is an attempt to promote the idea that It's more important that we come together to build understanding about the values provided by better validating our code than to convert people to the particular practice that works for us individually. To build this understanding, we need to more actively embrace new ideas, explore them, and have open lines of communication that are free of drama and contention. This will not happen if we cannot openly admit the notion that there is more than one "right" way to do things and we keep preaching the same tired story that many of us have already heard and have frankly progressed beyond. It's OK to be passionate about a particular viewpoint, but we still need to be respectful and check our egos at the door when it comes to this topic.

As a final tangental reference regarding Uncle Bob's apparent redefinition of the word fundamentalism in his post. As far as I can see the definition he chose to use was never actually known to be used for this. While I understand what he was trying to say, he was just wrong and DHH's use of the word based on the definition I've seen is still very apt. From the dictionary:

1 a often capitalized :  a movement in 20th century Protestantism emphasizing the literally interpreted Bible as fundamental to Christian life and teaching
  b :  the beliefs of this movement
  c :  adherence to such beliefs
2 :  a movement or attitude stressing strict and literal adherence to a set of basic principles <Islamic fundamentalism>  <political fundamentalism>

Uncle Bob, please try to be careful when rebuffing folks on improper word usage and try not to invent new definitions of words, especially when you're in a position of perceived authority in our little world of software development. Express your opinion or facts and be careful when you state an opinion as if it where a fact, it only lends to confusion and misunderstanding.

Friday, May 9, 2014

The brokenness of java in the cloud

In the cloud, it's important to be able to do have computers talk to each other and invoke commands on each other. This is called Remote Method Invocation (RMI). By language, I have the following recap:

  • Python works
  • Ruby works
  • Javascript works
  • Java ... makes you work

Java RMI is a godawful mess that should be killed now. RMI is a very simple thing until you let an argument of architects come up with "the best solution" and then it becomes a convoluted mess of edge cases. In it's simplest, RMI involves serializing and deserializing some input parameters and then running some logic, then doing the same with some output parameters... gosh, that almost sounds like HTTP to me (hint, it really IS).

I don't know exactly where things went wrong with java RMI, but there are registries, incantations, and special glyphs you need to paint on your door on alternate Tuesdays to make it work correctly. In python, ruby, and javascript, I can do RMI 15 different ways while drinking a beer and cooking a steak. In java, I repeatedly need to stop, read a book, ask a professor, read the book again, build some custom software to bypass a limitation (like spanning subnets), then ultimately rejoice in the amount of effort I put into my masterful solution.

In some ways, I think dynamic scripting languages scare off hard core engineers because engineers enjoy a good challenge and a lot of things that engineers love to spend time doing become brainlessly simple when using the right tool for the job. Note, this entire post was spawned by trying to get jmeter to coordinate a bunch of remote hosts to run a load test distributed over remote EC2 instances... You'd think this should be an easy task, but it turns out to be more difficult that one might think (mostly due to java RMI limitations).

Wednesday, April 16, 2014

Continuous integration versus delayed integration

A vigorous area of debate in the development and architecture community exists around the value of Continuous Integration. To give context, when a software development team gets to a certain amount of concurrent work that involves multiple teams making changes in the same codebase, there are two main ways to handle this:

  • Each individual stream of work take a copy of the code and start working on their own
  • Each team coordinate all work streams within the same codebase

In my experience, the former method seems like a great idea, especially when folks start worrying about "what happens when project A impacts project B's timeline?". Everybody just works on your own feature and at the end of each team's work stream, just merge all the individual team's efforts back together in a shared location. The great advantage here is that if Feature A is pulled from the delivery, Feature B can be deployed independently. What is frequently overlooked is the fact that eventually, Features A and B still need to be mixed together. The longer we wait to do this, the greater the risk of integration problems and the less time we have to resolve them.

On the other hand, having everybody pushing work into the same development stream immediately introduces some problems that need to be resolved at build time. It's immediately necessary to have a process to allow reconciling competing Features and disable features should it not be complete before delivery. The major downside continuous (or at least frequent) integration is that individual developers and Feature teams are a bit slower day to day as they uncover unanticipated problems due to competing requirements in the various work streams, but the upside is that these are discovered early in the process instead of at a very late stage.

In short, delayed integration doesn't actually save any time in the long run as the will eventually still manifest themselves, just later in the process when you have less time to fix them. To all folks doing multi-stream development, seriously investigate limiting your work to "one branch per environment", and work out how to independently configure the environments to know which features they should enable.