Tuesday, August 30, 2011

Lucid Charts and Gliffy for online diagramming

My one sentence evaluation:

If you MUST use Internet Exploder then Gliffy is your solution, otherwise download Chrome (or other html5 capable browser) and use use Lucid Charts.

The long Version:

As a long time Visio user, it used to be one of two microsoft tools I missed when working on linux/Mac. Beyond not really working on linux, it's just too damn expensive for the amount of time I spent using it. I'm an occasional user who needs to slap something together that is only slightly more professional than a napkin (although I've used napkins on occasion) or ascii art for illustrating network diagrams or software architectural components.

In the last 5 years or so, a couple of online tools have emerged that let you do this inside your web browser. I've used and been a paying customer of both Gliffy and Lucidcharts and will say they are both pretty good tools and if you're someone like me who like me, they can meet your needs.

Lucidcharts is a better put together product from a useability perspective, has more widgets and seems to be innovating much faster. For example, Lucidcharts has iphone templates, works on iPad, and can import visio. This last one is a killer feature that Gliffy has failed to implement for almost 4 years.. Lots of excuses, but no feature. Gliffy has abandoned their customers by not implementing this and regardless of how difficult it might have been to implement, personally, I think this feature alone will stand to propel lucid charts ahead of gliffy.

Gliffy uses flash and therefore will work on old browsers and IE, but you're likely out in the cold trying to use Gliffy on an ipad. All told, I'm very satisfied with Lucid charts and highly recommend it. Gliffy isn't a horrible product and I'm sure might find a niche in large corporations that don't use Visio already and are stuck using IE, but it just feels klunky in my hands and hasn't seemed to keep pace with what I need.

edit
I just noticed this post five best online diagramming tools and neither Gliffy nor Lucid Charts are listed...not sure why, maybe they aren't in the five best?


Tuesday, August 23, 2011

Push versus pull deployment models

There are two deployment models: push and pull. In a "Pull" deployment model, the individual servers contact a master server, download their configuration and software, and configure themselves. In a "Push" model, a master server pushes the configuration and software to the individual servers and runs commands remotely. While neither technique is right or wrong, they both have some specific advantages and disadvantages that you should understand when making a decision. In the ruby hosting world, an example of a "push" deployment is heroku, while Engine Yard is an example of a "pull" type deployment.

In the Pull model, each server has information about how to obtain it's configuration, when it boots (or whatever the configuration triggering event happens to be) it can continue to proceed without intervention from the master server. In the Push model, because the master server is orchestrating things, it will typically need to have a connection back every application server. This can lead to performance and scaleability problems when trying to deploy 100s or 1000s of application server images simultaneously.

On the other hand, the pull model doesn't typically have a way to ensure that all servers and software are launched in a particular sequence. In push scenario, the master server can coordinate which servers come up in which order. If you have a situation where application a depends on application b running, it might be a better idea to have a push deployment for these two.

In addition, the pull model starts to get wonky if you want to attempt "instantaneous" deployments. Then either the application servers need to poll frequently or "long poll" OR -- you need simulate a push deployment ;).

From a security perspective, since a pull deployment needs to connect back to a master server, it can potentially open up some security holes. In a push deployment, the master server has access to the application server, but the application server has no access back to the master. This means it is much less likely that a security breach on one of your application servers can corrupt your master server.

Both models work, and they are not mutually exclusive, but understanding their key differences is important.

Some other references:
Automated deployment systems: push vs. pull

Configuration management: push vs. pull

Thursday, August 18, 2011

Groupon makes me tired

I read a blog post a while back comparing groupon to a ponzi scheme, but I think it's more accurately Tulip Mania.

I discount the value Groupon can have for a business who is missing traffic to their store and struggling to figure out how a business gets a loyal customer from this tactic. As a business, it seems to me Groupon will only create disloyal customers and as a business, the best I can hope for from is a sort of ebay-like place to either dump excess inventory or fill in slow periods where I have trouble getting customers into my place of business.

When looking at the fundamentals, Groupon has only overhead. The only arguable asset is their consumer base and I think it provides no intrinsic value as these customers represent a pool of people who will buy something for 50% or more off. I hate to break the bad news, but those people aren't hard to find. Worse yet, the switching cost from groupon to "whatever else" is almost zero.

For Groupon to survive, they need to figure out how to make both their consumers and their business customers "sticky".

Tuesday, August 16, 2011

The overhead of annual enrollment in the US

For folks employed in "traditional" jobs, there is a common event in the modern age that raises collective blood pressure. It's know as "annual enrollment" and it is a period of time where most employees change/adjust various benefits offered by their company. In particular, medical insurance is a common thing to adjust.

While this is normal a cost of doing business, I think many folks underestimate the real costs of switching. Sure, there's the cost for the HR department to go out to 20 different brokers and try to get the "cheapest/best" plan for their employees, but there're also a number of hidden costs. For example, every year my health plan changes, I spend at least 5-10 hours futzing around with various billing changes as well as filling out forms etc.

When I quote 10 hours, many folks say "you're crazy, it only took me 15 minutes". I think these folks, greatly underestimate the amount of time spent because they're only worrying about the time to fill out the form. Because of these hidden costs, I imagine many employers scratch their heads trying to figure out why using a "cheaper/better" plan didn't show up on the bottom line.

Examples are:
Spending 15 minutes at the front desk of their physician/dentist/othodontist changing their information.
Reading about the plan and figure out what is and isn't covered.
Figuring out if their physician is "in-network" or not.

For dual income families:
Analyzing the differences between the two plans and figuring out which one is best.
Filling out "coordination of claims" paperwork (and heaven help someone who is divorced and needs to coordinate THAT nightmare).

All these costs erode the bottom line because they take time away from performing other duties. I'm not sure a national health plan is the solution for this, but the current system certainly certainly is a recurring drain on resources for every business except insurance companies.

Monday, August 15, 2011

Heroku is a bus, Engineyard is a car

Engineyard and heroku are two widely used ruby on rails hosting providers.

A common question is: Which one should I use?
The answer everyone gives: It depends!

Having deployed the same application in both environments, I thought I'd highlight some of the important differences.

#1 "Ease of Use"
Heroku blows engineyard away. you install the gem and can deploy your application in minutes. There are also commands you can run on your local machine to get information about your application.
Engineyard is moving forward, but it is still pretty technical. It's really easy if you have a public github repo, but anything other than that starts to get "more complicated" quickly.

#2 "Architecture"
Engineyard gives you a "real" virtual machine. This means you've actually got a single CPU virutal host that you ssh into and effectively do whatever you want.
Heroku gives you a sandbox with walls around it, and I think it's a shared environment. It's actually kinda difficult to figure out exactly what they're running as you cannot log onto the machine direction.

#3 "Startup Price"
Heroku gives you a free (as in beer) environment.
Engineyard let's you run a trial environment for free for a period of time, but you eventually have to pay for it... even if nobody ever visits your site.

#4 "Flexibility"
Heroku lets you do anything you want as long as they've preconfigured it to enable you to be able to do it.
Engineyard gives you ssh capability to the machine, which means you can do anything you want even if they didn't think it would be a good idea.

Overall, I'd say Heroku is like taking the bus: if enough people want to go the same place at the same time, it's more economical. Engineyard is like buying a car: it's going to be a bit more expensive and you're going to need to know how to drive, but it is a much more flexible solution.

Friday, August 12, 2011

The economics of scaling software development teams

One common theme in the development world is the vast difference in productivity between individual practitioners. Among folks with similar backgrounds, some are easily 10x more productive at delivering functional software. Software development is obviously not alone in having to deal with this, but I there are some particular attributes of software development that make dealing with this mismatch particularly difficult.

The first problem is that many people seem to have the mistaken impression that if one has developers who produce output, then the output is simply the sum of the individual outputs. This is just not true by any reasonable measure and is a contributing factor to the reason many outsourcing projects end up a smoldering mess. Mathematically, each developer adds a cumulative amount of drag if the software they are writing needs to communicate with software other folks on the team are writing.

As an example, suppose we have 1 developer working on a project, and he has a perfect understanding of the requirements. The overhead for this person to write the software is the cost of typing it into a computer, compiling (if necessary), and deploying it. Obviously the frequency and cost of compiling and deploying are a factor, but this is typically a fixed cost.

Now let's add another developer who is going to work on the same software as the first. We now incur the overhead of the first developer communicating what he is doing to the second (and visa versa) PLUS the overhead of bringing the new developer up to speed on who things work. While bringing the new developer up to speed is a one time cost, the communication overhead is ongoing. Generally speaking, the communication overhead can be represented as a complete graph with a formula of cost=(count*(count-1)) /2.

What does this mean? Well, for starters, compared to a team of two, the communication overhead for a team of 10 is 45x higher and a team of 50 is 1225x higher! More importantly, this is assuming unrealistic perfect communication, so the real factor can be much larger depending on the effectiveness of communication. As an example, using high latency communication or off cycle team mates can often double the cost of each network interconnect.

But wait! we know there are software projects with tens, and even hundreds of developers, how can they possibly be successful?

For starters, many teams simply absorb this cost and they largely end up much less successful than they COULD be. Any project manager or even developer is probably familiar with the sad problem of having an initial spike of productivity with a small team, then looking on in dismay when an influx of new people to the project drops their delivery velocity.

A better approach approach is to reduce the amount of communication necessary between subgroups within the team. This can be done by choosing a software architecture that has well defined boundaries and focusing the communication between the independent teams on those boundaries. For example, many places do this by having a "backend team" that writes database access code and a "frontend team" that writes the GUI.

So in our example of our team of 10, if we broke it into two teams of 5 who intercommunicated via one interface, our communication overhead would be reduced to 21 from 45.... that cut it down by more than half!

What does this mean? It means by following these guidelines, project managers and architects now have another tool to account (literally) for a variety of team configurations. Information technology architecture is not JUST about software and hardware, but also wetware (people) and human factors need to be accounted for as part of the art.


Thursday, August 11, 2011

Ruby 1.9.2 changes and i18n on Mac OSX

We recently noticed some pretty interesting changes in ruby 1.9.2. It appears that require no longer allows relative paths and ruby is now more unicody. This means if you are in a directory with two files "foo.rb" and "bar.rb", you can no longer simply type "require 'foo'" inside bar.rb to use foo. Now, you need to either do "require './foo'" or "require_relative 'foo'".

A potentially more difficult change is in how ruby handles character encodings. For the most part, this isn't a problem inside "normal" code and strings, but things get dicey if you start reading text files off a filesystem. This is especially dicey if you're doing this and you're on a mac AND you work with western european data AND it involves money. If you save a file with a currency symbol on a mac, then subsequently read the file on a machine (or a tool) that uses/assumes utf-8, you will not see €, you will see a Û.

To cut to the chase, if you're developing software on a Mac, make sure you change your tools to use utf-8, NOT macroman or you will at some point be scratching your head. Why? As a quick example, the Euro symbol in macroman is mapped to a different character than it is in UTF-8. More importantly, for international applications, non-latin characters don't exist and you won't be able to properly edit files with asian and other non-latin based characters.


Wednesday, August 10, 2011

git and github not change mangement tools

I recently stumbled across a problem with git that is going to cause no manner of headache for the uninitiated. Git repositories are fundamentally insecure and the audit trail is dodgy when folks are either #1 intentionally malicious or #2 ignorant to how git works.

For the back story, I have a number of github accounts, one is used for code I use while blogging, another is for internal projects at work, and yet another was for a client I was working on. A while back I noticed that I had commits that apparently were done with my "client" github account that showed up in my blog.

Confused, I verified my public/private key pairs against what was in github and was truly stumped as to how this was happening. While scratching my head, I remembered that there is the concept of a "global" config in git and ran the following command:

git config --global -l

Uh oh! it turns out I had a global config set... When I went back and looked, EVERY commit I had done was as this erroneous user.

The problem seems common to distributed source code revision management systems, but is fundamental to git. Every copy of every repository is trusted to maintain it's own copy of revision history. After realizing this, I also realized that I could rewrite history, push to the central repository and effectively delete and/or amend the revision history in the master repository.


As an example, I tried the following:
git clone git@github-personal:mikemainguy/mainguy_blog.git
git config --replace-all user.name Scooby
git config --replace-all user.email sdoo@doo.com
echo "Scooby Doo" >> README
git commit -a 
git push

Now when examining the history in github, my commit shows up as having come from some cat named scooby doo! Worse yet, there's no apparent way to figure out which github account actually pushed the change.

Worse yet, I can rewrite history in my local repository, push it out and make old changes disappear and nobody will be able to see what happened.

For example:
echo "SECRET STUFF" >> README2
git add README2
git commit README2 -m "whoops"
git push
Check your central repo, you see README2

git filter-branch --index-filter 'git rm --cached --ignore-unmatch README2' -f
git push

Now check github, the commit is still there (I'm not sure if that can be completely removed), but the file is gone... It's as if it never existed!

So, applying this knowledge to my screwed up commits I ran the following:
git filter-branch --env-filter 'export GIT_AUTHOR_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_COMMITTER_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=mike.mainguy@gmail.com' -f
git filter-branch --env-filter 'export GIT_COMMITTER_EMAIL=mike.mainguy@gmail.com' -f
And all the craziness is gone...

Knowing these details is important if you're using git/github because most people coming from a centralized source code control tool would find this behavior a little bit disconcerting (if not just plain wrong). The thing to remember is that every person you trust to push to your repository can effectively remove/rewrite history to their own liking. If accountability and audit trail are important, you'll likely need to adopt a "pull" model and have someone manually verify/rewrite each commit.

Tuesday, August 9, 2011

Avoid Bowfinger Syndrome

I just read a great blog post about what it costs to write AAA games and the author used a term I love called "Bowfinger Syndrome". Bowfinger is a movie starring Steve Martin and Eddie Murphy in which Bobby Bowfinger (Martin) tries to make a movie with only $2000. While a very funny movie, it hits home in the software development world in many ways.

Too often, software projects fail because folks grossly underestimate the costs associated and then spend time trying to work around the lack of budget to actually finish the project. There are a number of reasons for this and I'll give a quick list:

#1 A prototype of some software was written and someone extrapolates the costs to build the "real software". This is a mistake, prototypes are like hollywood cutout towns... they may LOOK like real software, but they don't necessarily WORK like real software. If you're eyeballing the hours necessary to build a prototype and trying to estimate effort to build the finished product, step back and imaging how you would do this if you were trying to convert a hollywood set into a real city.




#2 The costs to write an initial version of some software are buried and not really understood. This happens when domain experts or highly skilled developers write initial versions of software and then folks think that they can economize by using unskilled or domain ignorant developers to write something else or a new version of the same software. Put another way, if 1 surgeon can do 1 surgery in 1 hour, that doesn't mean 5 trained monkeys can do 5x more surgeries in the same amount of time.



#3 A contract does not mean the software can be delivered. If your business depends on software and your plan is to somehow get a lawyer to deliver it, you're not going to have a business for long. Put another way, just because someone is foolish enough to sign a contract and agree to build you a ladder to the moon, doesn't mean it's a good idea to start pre-selling tickets.



Photos:
utahstories.com
times3online.com
www.web20lawyer.com





Monday, August 8, 2011

javascript sleep method

For newcomers to javascript, it might come as a surprise that there is no sleep method. Worse yet, if you search the internet, you'll find all manner of really... really bad ways to simulate this. One of my favorite "rotten tomatoes" is something like this:
alert('start');
  var date = new Date();
  var curDate = null;
  do { curDate = new Date(); }
  while(curDate-date < 5000);
  alert('finish');

Note, I borrowed this horrible example from stack overflow. If you're lucky, that example will not completely crash your browser. A much better solution is something like this:
alert('start');
  setTimeout(function() {alert('finish')},5000);

The obvious problem is that if you truly want to simply pause for 5 seconds in the middle of a really long method, the anonymous function is not going to help you out very much.. unless you do something like this
alert('start');
  //lots of code
  var a = 'foo';
  setTimeout(function(){
    alert(a);
    //lots more code
  },5000);
If you find this solution lacking, you're probably due for some refactoring of your code as you are probably writing highly procedural code and it's likely that javascript is going to cause you other, more serious problems. This is on stack overflow here, upvote if you think it's a good solution.

Friday, August 5, 2011

Adding methods at runtime in javascript, ruby, and java

OK, the title is a ruse, you can't easily add methods to java classes at runtime. But I'll illustrate how to do it in javascript and ruby. For example, lets supposed we have a javascript object(function) and we want to add a say_hello method to it:

var myObj = {};
myObj.say_hello() // doesn't work
myObj.say_hello = function() {
  return "hello";
}
myObj.say_hello(); //works


This is because javascript treats functions as first class citizens and doesn't even bother with the concept of "classes" as something other than special functions.

Same thing in ruby:
myObj = Object.new
myObj.say_hello # doesn't work
def myObj.say_hello
  "hello"
end
myObj.say_hello # works

There's a subtle difference here. The ruby syntax seems a little strange to me and it wasn't obvious how to do this. In javascript, it's very obvious that you're assigning a new function to the attribute (that you're adding). In ruby, using def in this manner seems out of place...

Ruby seems to attribute special meaning to the class and object definition where javascript treats methods (functions) just like any other variable.

Digging around a little, the ruby situation gets a little strange:

myObj2 = Object.new;
myObj2.say_hello @ doesn't work (this makes sense because we only defined the method on one instance
Object.respond_to? :say_hello #false ??? I "kinda" get it
myObj.respond_to? :say_hello #nil  ??? OK, now I'm pretty confused


If I want, I COULD have added the method to the class in ruby and then every instance would get it e.g.
myObj = Object.new()
myObj.say_hello # doesn't work
class Object
  def say_hello
    "hello"
  end
end
myObj.say_hello # works
myObj2 = Object.new()
myObj2.say_hello #works
MyThing.respond_to? :say_hello #works


It really depends on if you want to "backport" new methods to all objects of a class or only add the method to a particular instance. To do the equivalent in javascript, you'd do something like
var myObj = {};
myObj.say_hello() // doesn't work
Object.prototype.say_hello = function() {
  return "hello";
}
myObj.say_hello(); //works
var myObj2 = {};
myObj2.say_hello() //works
(new Array()).say_hello() //works!


The difference between how the two languages accomplish this is pretty minor, the more important thing to understand is the difference between adding a method to and instance (object) or a class definition(prototype). Not having a complete understanding of these differences can cause a lot of problems and subtle bugs in your code.

Thursday, August 4, 2011

Object Oriented Javascript

Suppose we need to create a counter that starts at zero, increments to 4 and then restarts at 0 (A "you can't count to five" counter). There many ways to do this, but a straightforward way someone might do this in javascript would be:
var global_count = 0;
function increment() {
    global_count ++;
    if (global_count > 4) {
        global_count = 0;
    }
}

This is pretty common for beginners and works until your systems start to get complicated and your global_count gets clobbered or multiple people need to have counters. For multiple counters, one might start with a naming convention and add variables like "global_count2, global_count3" and et cetera. This doesn't solve the problem of the variables getting clobbered in other parts of your system and it likely makes it worse because now there are more permutations to mix up and accidentally change the value of.

As an example of what I'm talking about
var global_count = 0;
function increment(my_val) {
    my_val++;
    if (my_val > 4) {
        my_val = 0;
    }
}

increment(global_count); //works fine

//Somewhere else a smart guy decides to do this
global_count = 52;


To prevent this sort of thing OO has the concept of information hiding. In java, a straightforward way to avoid someone clobbering your count is to encapsulate the "count" variable and only access it via a method that performs your business logic.
public class BeanCounter() {
  private int count = 0;

  public void increment() {

    count++;
    if (count > 4) {
       count = 0;
    }
  }
  public int getCount() {
    return count;
  }
}

So now, when this java example is literally (for the most part) translated to javascript, it looks like this:
function BeanCounter() {
    var count = 0;

    this.increment = function() {
        count++;
        if (count > 4) {
           count = 0;
        }
    }

    this.getCount = function() {
        return count;
    }

}
var my_counter =new BeanCounter();

But a more javascripty way to do this would probably be:
var bean_counter = function () {
    var that = {};
    var count = 0;
    that.increment = function () {
        count++;
        if (count > 4) {
           count = 0;
        }
    };
    that.getCount = function () {
        return count;
    };
    return that;
};
var my_count = bean_counter();
my_count.increment();
my_count.getCount();

In both these examples, we can see some key differences between javascript and java. First, there really isn't anything formally defining a "class". We simply have functions with different mechanisms to extend and enhance them. In addition, there's more than one way to actually implement something that could fulfill some of capabilities that a more traditional OO language (like java) might give us. In this case the thing we wanted was the ability to ensure we could have things that would only be able to count to 4, then reset back to 0.

There are some advantages to the second approach when it comes to inheritance and polymorphic behavior that I won't go into here, but for a more in-depth look at different ways to apply (and mis-apply) OO concepts in javascript, I highly recommend Javascript, The Good Parts by Douglas Crockford.

Wednesday, August 3, 2011

Git for dummies

OK, the title of this post is a lie... git is decidedly NOT for dummies. Git is more for your really smart mentally gifted people OR for people who are working on exceeding complex software projects that have complicate merging and revision history requirements. Neither of these groups should be dummies or you will have a serious problem being effective. For the context of this tutorial, a "dummy" is defined as somebody who knows and understands how to use SVN or CVS in a team environment.

So if you're still reading, I will walk you through the simplest workflow I can discover for git that work and doesn't cause too many complications. For the sake of simplicity, we're going to assume you're working on a project hosted at github that already exists. I've created a public repo if you'd like to try along at home. Assuming you already have git installed, you should be able to "clone" the repository to your local machine by issuing the following command:

git clone git://github.com/mikemainguy/mainguy_blog.git

edit: I used the wrong url originally.
At this point you should now have a subdirectory called mainguy_blog with a "README" file inside it.

Assuming that everybody is working on a single branch of development, the workflow is pretty simple.
  • edit files (vi README)
  • add files to staging area (git add README)
  • commit changes (git commit README)
  • pull changes from remote (git pull)
  • push changes to remote (git push)

One thing you will note is that the number of steps is a bit different than you might be used to with svn or cvs. In particular, git seems to have added some steps. With SVN, the workflow would typically be:
  • edit files (vi README)
  • update from remote (svn update)
  • commit changes to remote (svn commit)

With Git, we've added the notion of a local staging area AND a local repository. This will really confuse dummies like me at first and I cannot emphasize enough that you need to think about the implications of this. I guarantee you THINK you get it, but the practical implication of not grokking it are that you will likely do tremendously stupid things for a period of days once you get into a larger team and/or someone starts to try doing some fancy merging.

So now, we're going to walk through a "normal" multiuser scenario.
  1. User 1 edits README and adds it, and commits to their local repository
  2. User 2 edits the same file in a different place, adds it and commits to their local repository
  3. User 1 pushes their change to github and the change looks like this
  4. User 2 tries to push their changes to github, but they discover that user1 has already pushed their changes.
  5. User 2 pulls their changes from github
  6. Git automerges the file because there are no conflicts
  7. User 2 pushes their changes to github
When we look at the github history we something interesting... there is an additional commit that was added at the end to indicate git/User2 merged some files. Aside from the extra workflow steps, this is an additional point of confusion for quite a few newcomers.

In short, a workflow to have git work like perhaps a dummy like me would expect follows:

vi README
git add README
git commit README
git pull --rebase


Now here is where things can get tricky. In the SVN world, if you have merge conflicts, you fix them and move along committing the results when you've "fixed" things. With git, on the other hand, you need "add" fixed files back in and continue the rebase. So, if you have no conflicts, you're actually done at this point, but if you have a merge conflict, you need to do the following steps


vi README
git add README
git rebase --continue


Once this is finished, push your changes back to the remote repo


git push


additional warning

When rebasing, if you get a conflict, do NOT commit the files, only ADD them. If you commit the files you will condemn yourself to a hurtful place where your commit history shows conflicts with things that you didn't even edit.

I think git is a wonderful tool, but it has a much steeper learning curve than it's simpler and kinder cousins svn and cvs. While my perspective is skewed by years of using SVN and CVS, I think it is pretty safe to say that these tools have millions of users and I am not the only person to go through the pain of "figuring out" git. The addition of remote/local and a staging area seems to be a common point of confusion for newcomers who've arrived at git from the SVN/CVS world.