Friday, December 16, 2011

Java classes, objects, and instances demystified (hopefully)

A great many people are competent java developers, but have only a vague understanding of the difference between a "public static method", "public method", and the difference between a class and an object. As this was confusing to me at first, I thought I would give a quick overview.

A class defines a template for what data and operations are available when you tell the JVM to create an object.

So, for example:

public class BlogPost {
    public String text = "";
    public static BlogPost latest;
    public static BlogPost create(String input) {
        latest = new BlogPost();
        latest.text = input;
        return latest;
    }
    public int getTextSize() {
       return text.length();
    }
}

When you compile this class, it is creating a file that the java runtime can later use to enable programmers to load an object into memory that has a single attribute called "text" which is a reference to another object with a class of String. Additionally, the class itself has an attribute called latest that is a reference to an object with a class of BlogPost, a method called create that accepts a reference to a string, creates a new Blogpost, stores it to the "latest" class variable, set's the text attribute on the "latest" class variable and subsequently returns the reference to "latest". In addition, there is an instance method called "getTextSize()" that returns the result of the instance method "length" for the "text" instance variable of the object.

So far, if you've done any amount of java programming, this shouldn't be too shocking or eye opening. However, there are some subtle and not-no-subtle nuances that are at play here. First and most commonly confused is that static methods cannot get access to instance variables.

Why?

Let's talk through this.... when java is running, class definitions are broken into two pieces. The data piece and the method piece. The data piece is independent for every new instance, the method piece is identical for every instance created and unique for the particular class. The method you can call on a particular class are unique to the CLASS, not the instance of the class. In addition, static variables on the class are also unique to the CLASS, not the instance. So, for example, the method area for our "BlogPost" has two method references... one for "create" which expects to receive a String reference and will return a BlogPost reference, and one for getTextSize, which expects to receive a reference to a BlogPost instance an will return the integer value of the text field reference stored on the BlogPost reference it received

When we are in the create method, there is no BlogPost instance available to look at... since the "length" method and the "text" instance variable both need a copy of a BlogPost object loaded into memory ... there's no way to access them.

Put another way, When you define an instance method, even though you don't tell the JVM, it automatically knows that it MUST have a reference to it's defining class already loaded into memory (BlogPost) in order to perform it's operations.

This gets to the heart of what I think a lot of people don't get about java (not ALL languages/VMs do it this way, but java DOES). Another way to code the getTextSize() method above would be to do this:

    public static int getTextSize(BlogPost myPost) {
        return myPost.text.length();
    }

Some people would think that this method is more effecient because you don't "waste" memory having all kinds of copies of the function loaded into memory. The fact is, java is not that naive, ALL methods are effectively singletons and there will only ever be one copy of the method implementation in memory. When you call instance methods, you're simply telling java that you want to make sure this method ALWAYS REQUIRES an instance of my class as an implicit parameter. In addition, the language has a nice way of defining this were you don't have to explicitly pass the instance as a parameter. There really is no difference at runtime between the memory consumption of the two implementations.

For a little more discussion, take a look at Stack Overflow . Hopefully this will help clarify things for some folks.

Friday, December 9, 2011

The java collections framework for newbies

I don't consider myself a java expert by any measure, but there's a disturbing thing I've noticed. There are a LOT of people who claim to be "java developers", but they have zero clue what the "java collections framework" is. This post is designed for folks who keep getting stumped on interview questions or are mystified when someone starts talking about the difference between a Set and a List (for example).
If you google "java collections framework for dummies" you'll find this link which has a more complete, if fairly dense explanation. I'm going to do you one better and give a rule of thumb that you can use without thinking about it.
At it the root of things, a collection is something you can store other things inside. Just like in real life, a collection of marbles is just a "bunch" of marbles. The big difference in the collections framework is that the different implementations have different things they DO with the marbles that you need to understand.
For example, let's consider the ArrayList... everybody and their brother should know this... if not, you are not a java developer, go read a book. Some special things about an array list: It stores the entries in order of when they are added, you can select an element by it's index, it can contain duplicates of the same element. From a performance perspective, it is VERY fast to lookup and add things by index and add things to an ArrayList, on average, it is slow to see if a particular object is present because you must iterate the elements of the list to see if it's there.
Next, let's talk about HashSet... I realize that this might sound vaguely drug related to the uninitiated, but a hashset has some interestingly different characteristics from a list. First off, a HashSet has no concept of order or index, you can add things to it, you can iterate over it, but you cannot look things up by index nor are there any guarantees of what order things will be presented to you when it loop over members. Another interesting characteristic is that it cannot contain duplicates, if you try to add the same object twice, it will NOT fail, it will just return false and you can happily move on.
Last but not least, there is the Hashtable (or his slightly more dangerous cousin, the HashMap). This is used to store key/value pairs. Instead of keying things by an index (like an arraylist), you can key them by just about anything you want. You can do things like myMap.put("foo","bar") and then myMap.get("foo") will return bar...
There is a LOT more to this, but with this quick reference you can at least begin to do useful things in java.
Examples of using a List
ArrayList myList = new ArrayList();
myList.add("Second Thing");
myList.add("Second Thing");
myList.add("First Thing");

System.out.println(myList.get(0));
will output
Second Thing
An interesting thing to note is that the size of this is 3
System.out.println(myList.size());
will output
3
The following:
for (String thing: myList) {
  System.out.println(thing);
}
will always output:
Second Thing
Second Thing
First Thing
Next lets look at a set:
HashSet mySet = new HashSet();
mySet.add("Second Thing");
mySet.add("Second Thing");
mySet.add("First Thing");
The first difference we can see is that
System.out.println(mySet.size());
returns
2
Which makes complete sense if you understand that sets cannot contain duplicates (and you understand how the equals method of String works...;) Another interesting thing is that: The following:
for (String thing: myList) {
  System.out.println(thing);
}
might output:
Second Thing
First Thing
or it might output:
First Thing
Second Thing
It so happens that it returns the second version on my machine but it's really JVM/runtime specific (it depends on how the HashSet is implemented and how hashcode is implemented and a bunch of other variables I don't even fully understand).
More importantly, the following will be likely be much faster for LARGE collections:
System.out.println(mySet.contains("Third Thing"));
Finally, the grandDaddy of all the entire framework, hashtable.
 Hashtable myMap = new Hashtable();
 myMap.put("a", "Second Thing");
 myMap.put("b", "Second Thing");
 myMap.put("c", "First Thing");

 System.out.println(myMap.get("a"));

Will output
Second Thing
and the following:
for (Map.Entry entry: myMap.entrySet()) {
    System.out.println(entry.getKey() + "=" + entry.getValue());
}
will output
b=Second Thing
a=Second Thing
c=First Thing
Hopefully with these examples, you can get an idea of the capabilities of the collections framework. There is much much more to it and I encourage ANYONE doing java development to spend time playing around and learning the different characteristics of the various components as I've only lightly skimmed the surface.

Tuesday, December 6, 2011

Is your team a cross country team or a soccer team?

While touring a college campus with my daughter, one of her prospective cross country team mates said something that gave me pause. In effect, her statement was that she really liked cross country because everybody on the team was always pushing for you to do your best. Also, she continued, it's nice to know that you either succeeded or failed because of your own effort and training, not because of anyone else. I've thought about this for quite some time and I realize there is a VERY big distinction between "individual" sports like wrestling, swimming, or cross country... and "team" sports like soccer, football, or basketball. These differences are important not just on the playing field, but in any situation that requires teamwork.

On "team" sports, you very often will have competition within the team that actually works against the team's objective. Additionally, individual team members may have to forgo performing at the best of their ability because of a particular game situation. For example, there are many good reasons for a soccer player to NOT dribble the ball when they have possession: They could be covered by a good defender, they might not actually be a very good dribbler, or they might have a teammate in a much better position to do something productive. This means at any given moment, they need to not only take into account their own situation, but the situation of 21 other people plus a ball and a referee or three.

With an individual sport, it is the performance of an individual that is paramount. In cross country, I supposed there are situations where it might be good to lose a few places individually in order to help pace a teammate, but there is not really as complex an interaction with the other players on the field. Put another way, the SPORT of cross country doesn't require as much social intelligence as something like soccer... it is much more purely sport for sport's sake and a measure of an individual's ability to perform.

At the end if the day, both styles of team are important and beneficial, but from my observation, there are some interesting implications. First, on a "team" sport, there are often social conflicts due to the complex interplay of individuals and game situations. On an "individual" sport, I think these conflicts are less common (or severe). It seems like this is because even if two cross country teammates are in fierce competition with each other... they're only helping the team and each other out and making both stronger. In contrast, if two soccer teammates are in fierce competition with each other, nothing good will happen and it will likely destroy the team.

When working with teams in general, it's important to understand if the situation is a "individual" situation, or if it is a "team" situation. If you're trying to motivate a team and it's more important that each individual do their best and the individual's contribution ONLY has positive effects, foster and reward the individual regardless of the performance of everyone else. If, on the other hand, your team has more complex interactions, it becomes more important to be sure to let players know when they might need to behave in a more altruistic manner for the good of the team.

Wednesday, November 30, 2011

How to ask intelligent questions

Smart technical people (aka Hackers) have likely dedicated thousands, tens of thousands, or hundreds of thousands of hours of their lives learning, understanding, and generally figuring out how things work. If you find yourself asking for help, I'd suggest approaching it like you would approach asking advice from someone like Tony Hawk on how to do a kick-flip (I apologize for folks who don't know who Tony Hawk or Ryan Scheckler are and further apologize if you don't know what a kick-flip is... you really should get out more. I further apologize for suggesting that programming a computer is in any way as difficult as performing a kick-flip, I've skateboarded since I was...like 13 and still cannot pull one off without doing a no-comply, it's just a useful metaphor that non-technical folks might be able to understand.

To illustrate some approaches:

BAD "Tony, I've never even stood on a skateboard before, but I REALLY want to learn. I've got a spare 5 minutes, could you teach me how to do a kick-flip? oh and could you give me a free skateboard? I don't have one yet and you get them for free so I figure you wouldn't mind giving one to me. By the way, I wouldn't bother you, but Ryan Scheckler seems too busy to help me."

BETTER "Tony, I never land my kick-flips, the board seems to land on it's side, any tips?"

MORE BETTER "Tony, I've been skating for 5 years and for some reason I land my kick-flips on the side of the board. I looked up some tips online and saw your video on youtube, and I think the problem is I'm not flicking my foot hard enough. I've considered getting a different pair of shoes... I brought my board with me, if I showed you could you give me some advice?"

OK, hopefully everybody understand the problems with approach #1. You're asking a professional to teach you something that has taken him decades to learn and perfect and dilute it into a 5 minute lesson (for free). You're unprepared and unequipped to even begin to solve the problem... this is like people who want hackers to troubleshoot their "broken computer" while sitting at the bar after work (while the computer is at home). You haven't even TRIED to solve the problem yourself... this means (to many) you don't even possess the dedication or resolve to even TRY. This is kinda like when someone has a problem and local hacker tries to help and the person requesting assistance decides to go out for coffee. Approach #1 in the hacker community will either: Not get any reponse, or get the equivalent of "FSCK OFF!".

Approach number two is better because it shows dedication, preparation, and makes no assumptions about how much time/effort could or should be put into it. An appropriate response to #2 would be "Practice Harder". Approach #2 is likely to get terse responses, but you won't likely hear any expletives.

Approach number three is most likely to elicit the best response. You've done your homework, you're dedicated, you're prepared, you've got a solution you aren't sure of... rock on! If a hacker doesn't give helpful advice in #3 remember: You're still asking for free advice...I'd personally expect to PAY for personal skateboarding training from a professional skateboarder -- get it, PROFESSIONAL, like, that's how they put food on the table. Free advice from a professional is known as "a favor" and should be treated as such...

For folks who have been in the open source community for a length of time, there is an old article explaining how to ask questions the smart way. While I think that writeup is spot on, I think it's a bit inaccessible. Listen, getting computers to do what you want is very difficult work and VERY challenging (if it wasn't, the job wouldn't pay so well and hackers wouldn't be in such high demand). Yes, I realize it isn't as rare as skateboarding talent, but hopefully the metaphor helps explain why many hackers seem to NOT be helpful... it's not THEM, it's YOU!

I apologize in advance for anyone who is easily offended.

Honestly I don't care if you're offended, but I figure I'd put a token apology in because people who are easily offended are often shallow enough to be placated by a token apology.

Thursday, November 3, 2011

stop branching! agile is soccer, not american football

One trend I've noticed with git users is a habit to create a lot of branch and merge activity. The oft-repeated mantra is "branching is (easy/cheap/safe) in git so I do it a lot". When working on an agile project though, this behavior can cause serious problems. To illustrate the point, compare american football to soccer: American football has highly specialized players and positions as well as a variety of tightly choreographed set pieces. In contrast, soccer has a much lesser degree of specialization, and while there are some set pieces that are choreographed, the majority of the game is spent reacting to the situation as it evolves. Traditional development methodologies are like american football: They divide the work up among highly specialized players and then try to replay an intricate set of movements to make the play "work". Agile methodologies are more like soccer (or to a lesser degree rugby) in that the advantage doesn't come from following the choreography (or even rehearsing it), but from reacting to the current situation on the field and having visibility and vision as to the current state of the field. When teams start creating a lot of branches and working in isolation for large periods of time (relative to release frequency), that means they are often making assumptions about how the plan is supposed to work. Unless this has been worked out well in advance, it often leads to a "massive catastrophic merge" when everybody tries to come back together. To maintain an agile development process, it's important to react to interdependent changes as early as possible and reenforce the notion of a team of generalists who must react and move based on the current situation, NOT by following a plan that was written months before. So, if you're on an agile team of 5 and each of you are working on multiple independent branches and not sharing them on a daily basis, you're probably trying to play american football. Instead of developing vision and dealing with the ebb and flow of the game as it unfolds, you're trying to rehearse what you think your role in the project is supposed to be so that you can execute your portion perfectly at the appropriate time.

Wednesday, November 2, 2011

I beat ruby on rails by 6 months

Waay back in 2003, I got tired of writing the same boilerplate crud apps and longed for a "better way to do things" so I wrote a rapid development framework called thrust. It used turbine, velocity, and torque to build an entire web application scaffold from an xml database schema definition. I look at the code now and kinda chuckle and shake my head, but something I realized is that it predates the public release of ruby on rails by a good six months. Moreover it predates the closest allegory I can find in the java space (Spring Roo) by a good 6-7 years! I'm not just tooting my own horn, because I remember talking to other people who all said things like "we should just use conventions" and "this stuff is just boilerplate, why don't we generate/template it?", but it seems like most folks just built internal-only proprietary solutions. Couple of lessons/observations: #1 promotion is everything... rails languished in relative obscurity until some folks started evangalizing it. My solution died on the vine as I moved on to bigger and better things. #2 Timing is important, but not MOST important. Being first can be an advantage or a liablilty. Grails got to learn from rails and avoid some of the wonkiness (for example). #3 Some times it's good to go back and look at what you've done for inspiration. I had forgotten about velocity templates...which are pretty useful. I also didn't realize that Maven (arrgh) originated form the Turbine project (which is what my framework was built upon). #4 Great ideas seem to burst on the market in a short period of time and one or two solutions seem to end up dominating. It seems that tech trends infect large numbers of developers simultaneously and then go away.

Monday, October 24, 2011

Gorillarinas, Putting the agile skirt on a waterfall Gorilla

Fact: putting a skirt on a Gorilla doesn't make it any more graceful

Ballerina gorilla

Are your agile initiatives Gorillarinas?

If you're working in a large organization and trying to "be agile", it often turns into a strange situation where only a superficial set of changes are made, but folks wonder why their initiative isn't able to deliver the expected benefits. This is remarkably similar to putting a skirt on a Gorilla and then wondering why it isn't suddenly graceful.

I'm not saying you cannot train a Gorilla to dance ballet, I'm an optimist at heart and believe anything is possible. But don't make the mistake of thinking that the effort of taking an IT organization that has built up the overhead gunk and crud of a huge process and turning into a highly responsive and lean software delivery organization is anything less difficult than turning a Gorilla into a ballerina. This effort will be large, there will be casualties, and you will likely need expert help (or therapy).

As a member consulting organization, I get the benefit of meeting up with my compatriots at Redpoint to discuss the challenges, successes, and failures when working within transform hulking and clumsy organizations into something leaner, meaner, and more agile. One theme that recurs in these discussions is that many traditional processes are deliberately dehumanizing. They view people as interchangeable components that can be replaced at a moments notice and have no impact to the overall project. After all, if you've run out of ballerina's, you can surely substitute anybody (or anything) in a tutu to do the dance, right? This is a key failing and just plain wrong, and even in organizations that throw around slogans and mission statements proclaiming how important everyone's individual contribution is, all traditional processes turn developers, QA analysts, and even Bob the dinosaur into "Resources".

Some ways to know if you've got a Gorillarina:

  • Daily standups status report meetings that ALWAY last 1 hour (or longer)
  • A variety of torture devicestask/bug tracking tools... one for the "official" stuff and one for the "agile" stuff...
  • ScrumMaster who's responsible for the work and assigns tasks to team members!
  • Team members who sit around and wait for someone to tell them what to do
  • Time-boxes that are well intended, but just never seem to work
  • Planning starts with announcing a release date, then estimating the effort involved
  • Retrospectives are about collecting lessons learned for our PMO to track!
  • We do Pair Programming (a senior guy reviews all commits), Test-Driven Development (20 page formal test plans review d by QA AND the analyst team, and use Continuous Integration (we do a massive build the week before a release)
  • User stories, requirements stories, development stories, test stories, and all sorts of other “types of stories” allow us to track the work!
  • Our impediments list helps track why we miss our commitments!
Some things you can do to pave the way toward having a more graceful Gorillarina:
  • Recognize that you are not going to instantly get results by doing standups or using notecards or renaming your project manager the "Scrummaster"
  • I know it's been said 1000x, but empower your team(s) to solve problems, if your process is getting in the way, ditch it... or fix it...
  • Recognize that "agile" is not a technology thing, it's a business thing... everybody needs to have skin in the game.
  • Agility is about values reified as techniques and practices, not merely practices!
  • If you're in the Chicago area, drop us a line at agile@redpointtech.com
  • Find another organization that has a track record or producing results and let them help you (although our sales department would really like you to use the link above ;) )

One last note and something that was brought up while editing. Agile is not JUST the practices (jeez, there we go again...) it is as much about building shared values and attitudes about what the team is doing. Many organizations think they can send a couple of guys to "scrumaster training" and they'll come back and infect the entire organization. Helping transform teams and entire organizations from one mindset to another is a job that requires practice, experience, and skills that are not acquired in a one (or two) week class. It takes a certain type of person to fulfill this mission and if your goal is REAL transformation, it is in your best interest to seek out folks who have a proven track record. It can be a daunting task for someone who is inexperienced in the field, and if you think you can send a Business Analyst to a training course and then have them run around your organization waving the magic "agile" wand at projects, you'll most often end up with a bunch of "agile best practices" but no real transformation.

Special thanks to Si Alhir and Steve Fastabend for feedback and editorial assistance....

Wednesday, October 19, 2011

Promiscuous programming

How many folks out there are promiscuous programmers? You know who you are, every project you work on, you meet a new technology or language and feel compelled to "try it out"... without giving the right amount of consideration to the language that is currently being used. Worse yet, you seem compelled to badmouth a language that has been really good to you ( I love you java;) ) and always compare the imperfections of your programming wife ( java, you're syntax is really bloated) to the sexy cool stuff from one of your programming girlfriends (ruby, I love your monkey patch).

I'm not saying that this is necessarily a bad thing, I think it is very important to have breadth in technology and learning new programming languages is a way to become a better programmer. It's more important though, to have an objective perspective about the REAL comparison and not just get infatuated with every new thing that wanders by because you think (or worse yet someone else thinks) it's better. Listen to me folks, the grass is rarely greener on the other side of the fence... It's different, and frankly it MIGHT be greener... but even if it IS greener, that doesn't necessarily make it better.

So where am I going with this? Am I suggesting that being a promiscuous programmer is a bad thing? Not really... and in that respect perhaps the metaphor is a bit bad. But, you have to suspect advice about women from a guy who's only ever been with one woman about women because ... he has a noted lack of experience. What I'm saying is when you're starting out, playing the field and figuring out what you like is important, but it's also important to commit to something and really "know" it. Make sure that changing languages is a conscious decision and you weight the benefits and drawbacks of such a switch lest you spend your entire career as a programming slut flitting between one fancy new thing and another.

Thursday, October 13, 2011

Two factor developer personality type scoring

I was recently sitting through a technical discussion and was thinking about how different people were reacting to the information in the presentation. From this I started to think about how a two pairs of related factors seem to influence how people react to new technology. Here's a quick chart showing them as well as examples of prototypical statements a person at each extremes in each quadrant might speak about Source Code Control:

For a bit of definition:

  • The Conservative - Liberal continuum is a measure how inclined you are to try new technologies. An extreme conservative feels no need to use anything new and will never use anything new unless the old thing completely fails to work any more. An extreme liberal would be compelled to change technological approaches before even finishing a prototype because a new shine technology showed up on their radar.
  • The Skeptic - Believer continuum is a measure of how much critical thinking you put into a new technologies capabilities. An extreme skeptic would tend to disregard pundits and anecdotal evidence and require solid proof. An extreme believer will have a religious-like believe in a technology that has no basis in fact.

The reason I put this chart together is to help understand that these things are not binary switches. In addition, some traits are orthogonal (Skeptic and Conservative), but some are in opposition (Skeptic Believer). Moreover, different techniques are necessary to understand how to interact with folks depending on where they fall in both continuums.

Conservatives will generally require a lot of selling or you will need to burn the bridge behind them or they will continue falling back into the "good ol way of doing it". Conservative Believers will be the hardest lot to convince because the sheer amount of time they've been doing something (no matter how stupid it might be) will actually be a reenforcing factor and no amount of factual evidence to the contrary will sway them. On the other hand, a conservative sceptic can often be swayed with evidence and a good business case for doing things "the new way".

Liberals will generally require more oversight as they will tend to embrace and proliferate new technologies. Liberal believers are the most difficult because they will do it just because they read about it on hacker news and it sounded cool. These guys often need to be brought back down to earth lest your systems be riddled with one-off science experiments. Liberal skeptics (I'd put myself largely in this camp) are easiest for me personally to deal with, but I think that's because I'm one of them. For the most part, the closer to each other on the quadrants your team is, the more likely you are to get consensus on what to do (or NOT for some groups.

Wednesday, October 12, 2011

My javascript parseInt("08") surprise!

I recently had to debug a problem that was causing a javascript function to return the incorrect value. The code in question was right padding numbers less than 10 with a 0: so 1 became "01", 2 becomes "02" and 10 should be "10".
Number.prototype.to_s = function() {
    if (this < 10) {
        return '0' + this.toString();
    } else {
        return this.toString();
    }
}
This works fine... in one direction.
I kept running into a problem when I tried to parse this back into an integer.
So I'd do something like
parseInt(8.to_s())
and the result would be 0. What I didn't realize is that the "0" prefix when parsing an string indicates the number is base-8 (octal) and therefore "08" isn't a valid number. I would have, however expected some sort of error message or NaN instead of 0. The problem is that javascript will only return NaN if the first character in the string is not a number... so It happily saw that '0' was a number, but '8' was not (in base 8) so it returned 0.
Aside: I see that this method of specifying base-8 is not supposed to be deprecated

Tuesday, October 11, 2011

JIRA, Pivotal Tracker, and Playnice.ly for issue tracking

I've used each of these tools on at least one project now (JIRA for quite a few) and thought I would share my observations about when each one would be most appropriate.

Pivotal Tracker

Pivotal Tracker is only offered as a cloud based or hosted solution, but has a pretty impressive list of customers. On the upside, they've fully embraced the cloud concept and offer a well documented set of RESTful APIs to integrate with third party systems. You can connect issues to SCM commits using post commit hooks from svn or git (or anything with a little effort)

I find the pivotal tracker User Interface be a bit confusing, but pretty useable out of the box and geared toward agile methods. Another big plus is that they can auto-estimate your actual velocity and burndown which helps you get your arms around your real velocity. In addition, they have some out of the box reporting.

playnice.ly

playnice.ly is also a hosted solution, but of the three has taken an more interesting approach to bug tracking. They've jumped on the gamification bandwagon and have tried to craft an experience that is actually enjoyable instead your typical cold-war soviet style user interaction that most bug/issue tracking systems seem to espouse. While innovative, they are missing some features and are decidedly still in the Minimun Viable Product stage.

You can integrate with git via post-commit hooks, but third party integration doesn't exist is still in beta and frankly the documentation was somewhat difficult to find (hint -- the link is in the footer). I can appreciate the idea of keeping things simple and this product will certainly work well for a small agile team or an independent who's just keeping track of things for himself. It's hard to say how this would scale to a larger team or very complex project as it is deliberately bare bones and I suspect this could work against you on a largish project or team.

JIRA

Last, but certainly not least, is Jira from Atlassian. I'd consider this the 800lb gorilla of bug/issue tracking and I've used it for years on multiple projects in a variety of teams and workflows. Frankly, you can do anything you want with this product... They can host it, or you can run it internally with a slew of different databases. It integrates with just about everything, and it has a plugin system that folks have used to build a vast array of plugins to help do just about anything you want.

The default user experience seems is horrendous and to match the impedence between what an agile/small team might need and the default JIRA config you're going to waste a lot of time either fetching plugins or configuring things to work how you want. JIRA's biggest strength is also it's biggest weakness and it would be difficult for me to recommend JIRA for smaller teams or startups as it's enormity imposes to some serious overhead --- either you eat the time trying to figure out the default UI or you eat the overhead of configuring it to be less complicated...

Here are the users I think would be a good fit for the tools:

  • A small team who just doesn't want to lose track of bugs or tasks, and has an interest in keeping an informal and deliberately fun culture - playnice.ly
  • A similarly small team, but one that finds a need for stronger organization and reporting options and/or if you think the idea of bug tracking being fun is somehow off putting - pivotal tracker
  • A large team or large organization who has thousands or hundreds of thousands of issues to track and will end up with a full time person or more to help manage track them. Additionally, if you need to host in-house, JIRA is the way to go

Thursday, October 6, 2011

Should I use mongodb, couchdb, or redis?

In the current nosql fervor, there is an important distinction that seems to get missed repeatedly. There are two (OK three) really important factors that these tools use to distinguish themselves and many people completely miss the point.

The first factor is durability -- does the data actually get saved to a disk somewhere and, if so, how often and how much might I lose if something "goes wrong"? Redis and mongodb users might be somewhat surprised to learn that, by default, they can lose your data should the process crash or shut down. While you can configure them to work around this issue, you're going to slow things down substantially doing so and therefore lose the big advantage they've been designed to provide. In short, redis is a great alternative to something like memcached, but is not really an alternative to something like couchdb.

Which brings me to the second factor, which is searchability (I couldn't think of a better term) -- Key-value stores are typically not designed to be easy to search, but to be able to fetch values by a particular key really quickly. Document stores are designed to enable more dynamic searching, often at the expense of some other attribute like speed, memory, or disk space.

Lastly, there's speed -- couchdb can be fast, but it's not really going to compare at real-time updates to mongo or redis. If real-time is your most important factor, couch is probably not your best solution (actually it certainly isn't your best solution).

So in the crop of current contenders (in no particular order) I'll give you my winners in certain use cases:

  • A fast disposable cache based on discrete keys: Redis... it's fast, it's widely known, it's easy to set up and use and more flexible than memcached (although memcached is also a good choice).
  • A durable and searchable document store that slowly accumulates more data and needs some concept of versioning (maybe like wikipedia or a blog engine): Couchdb
  • A quasi-durable searchable document store with quickly changing values (like a real-time status reporting application... maybe facebook or twitter: Mongodb

As for the other 9,999 choices that currently exist, I'd say don't dig around too much or agonize over your choice unless there is a specific and very important problem your application needs to solve that is difficult or complicated with these solutions. Should you get into that situation (like maybe needing to find directions like google maps) then you'll need to expand your horizons and look into other solutions. My recommendation is to start with one of these three and only go to a different solution when necessary. You could six months researching all of the possibilities and at the end have nothing but outdated research. Pick something and run with it, only then will you understand the problem and be able to make a better/more informed decision for your scenario.

More importantly, you'll probably notice that for many real-world solution, it might make sense to use all three of these (or more). I think part of what causes problems in "fair" comparisons of technology is that folks think they can pick the "single best solution for all problems" and that's just not a realistic perspective.

Wednesday, September 28, 2011

Amazon silk using webkit and SPDY

After reading the recent blog post by Amazon about their "cloud enabled" browser called silk, I went into a minor panic.  I couldn't find any immediate information about the rendering engine or any technical details about what they actually doing.  After some digging I uncovered some job postings that seem to confirm they are using webkit for their rendering and are leveraging SPDYon the network protocol layer.

This is great news for developers as webkit is already well established and most of us won't be immediately impacted by SPDY (well actually, your ajax experience might be impacted, but that's another topic entirely).  All told this isn't as big a technological change at the front end and is more of a story about amazon trying to use their infrastructure to make the mobile browsing experience better.  Frankly, this is a scaled up and modernized version of what blackberry did years ago (are they still doing that?).

NOTE to Silk team -- please implement an HTML5 native datetime picker on your device that doesn't suck so everyone else will follow.  This is a sore point for the mobile webkit experience right now and recurring problem that gets solved in a variety of "not so good" ways.

Friday, September 23, 2011

Using git in an agile environment

Git and most of the workflows I've seen on the internet are not designed (or well thought through) for most agile development.  By agile development I mean a small (1-10) team of professional developers all working together in short cycles (1-3 weeks) delivering functional software.

Why is git not designed for agile?  Because git, or more importantly, the workflows most readily apparent on the internet all focus on the idea of isolating changes or playing around with branches instead of getting stuff done.  This is great if you have a benevolent master who's sole job is to integrate disparate changes into a common code base (you know, someone like Linux Torvalds), but awful if you have a team of 10 people hacking away on a common codebase.  Aside from some very epic fails in the cvs and svn designs, git by itself doesn't really help in this situation.  I'd dare to say that git actually gets in the way 3x more than it helps.

To make matters worse, if you peruse the internet, you'll find a plugin called gitflow that canwill really make things bad.  It adds yet another layer of indirection and overhead that is totally unnecessary on a small team of professional developers working together in short cycles to deliver functional software (you'll notice that I've said the same thing again here).  Gitflow is great if you have lots of time to merge and integrate and otherwise play around with your tools, but not so great if you're on a small team of professional developers trying to work together in short cycles to deliver functional software... because it's just overhead.

I think I've made my first point about the type of team and type of goals that the  normal usage of git will cause problems with.  If you're not in that group and/or you like playing around with git, and/or you like sitting down at night with a copy of "pro git" and spending hours analyzing gitk diagrams... read the rest of this posting with caution because it's probably inflammatory and wrong in your world view.

First off... If you're working together writing a rails/struts/springmvc application, there are common files that everyone is going to be editing.  That's just the way it works, if you try to architect your solution by carving your application up into 10 pieces and give a piece to every developer, you're probably making a very big mistake.  The fact is, a version control tool is about SHARING CODE and software development teams are supposed to WORK TOGETHER, not in isolation.  Please repeat that sentence until you git it (hahah, I crack me up sometimes)...

Anyway, for MANY projects, especially small agile projects... branching should be a bad thing.  Every branch incurs the overhead of both keeping it up to date, but  also resolving conflicts as well as reintegrating (and testing) changes.  These costs are n^n as the team gets larger (actually, it might be n^n-1, maybe a math guy could help me out here?) and your objective should be to minimize them because they don't add value.

The simplest way to minimize these costs?  Have everyone integrate with a central repository (like github) and resolve incoming conflicts/test their code before pushing back to central.  If you follow that workflow, you're essentially using the central svn/cvs style coding on top of a decentralized tool, and there's NOTHING wrong with that.

So, how do you do it?  First, if you're using github, DON'T FORK, clone the repo to your local machine.  Once you've done that, follow this workflow (more  detail can be found here):

  1. write software and test it
  2. git commit -a -m 'My work here is done'
  3. git pull --rebase
  4. test and/or resolve conflicts
  5. git push origin 
  6. repeat


That's it... don't let the DVCS gods make you feel ashamed, just do it and all will be well.  There are certainly a few wrinkles that might happen in this workflow... notably, if #1 while you're fixing your rebase, someone else does a push, or #2 someone doesn't use rebase and is creating branches and merge commits.

#1 is easily fixable by perusing the interweb
#2 is a bit more tricky, but also fixable by perusing the interweb

In short, don't create branches on top of branches and screw around merging all over the place, rebase is your friend, use him wisely and all will be well.  Another note I will add is-- NEVER  do a "git push -f" (force push) in this mode.  You will screw with everybody on the team's head and they will likely need to start drinking early in the morning to figure out what is going on.

Wednesday, September 21, 2011

cross platform mobile development with phongap, rhomobile, and titanium

I've been working on a cross platform mobile application for keeping track of how much time I spend doing things. Basically a consultants helper to make sure I'm spending appropriate amounts of time doing the things that move me forward and keep me aware of how much time I'm spending doing things that aren't going in the right direction.
Anyway, having done mobile web development and knowing the pain there, as well as having put together a few android/iOs "hello world" applications, I did NOT want to go down the "build 3 different versions" road. I understand that there are compelling reasons to do this, but there aren't compelling enough to do it (in my situation).
Knowing that there are a number of solutions in this space, I narrowed it down to things that seem to have #1 the most innovation and #2 the highest potential to NOT be a dead end career wise. My list was narrowed down to titanium mobile,rhomobile, and phonegap. First off, titanium certainly has the best press corps in my opinion, the down side is that they appear to be going down the IBM path. That is, they're trying to build an "everything but the kitchen sink" heavyweight solution that is pretty off putting. In addition, the requirement to download proprietary development tools is just not where I want to go at this moment. While their solution looks very complete, I can't help but feeling it will end up being a dead end career wise. What I mean by this is that I think eventuall titanium will end up going the way of coldfusion - it will be widely used and very lucrative, but you'll be branded as a "titanium guy" for the rest of your career. For the reasons I mentioned, I actually downloaded the sdk, but didn't actually do any development using it.
Next is Rhomobile. One thing that excites me about Rhomobile AND phonegap is their distributed build system. This means I can actually build the cross platform components and NOT necessarily have the full SDK installed on my machine. Another upside to Rhomobile is that it uses ruby (a language I happen to like), but that upside quickly turns into a downside. I spent a number of hours trying to figure out how the browser components interacted with the ruby components and ultimately gave up. Rhomobile, I like the idea, but your solution is WAAAY too complicated and confusing. The other downside to Rhomobile is that same sort of "proprietary lock in" that I think will plague titanium. While I like the concept, I cannot get on board with rhomobile at this point.
Last, but certainly not least is phonegap. I actually used this for the first time almost a year ago and at that time I wasn't really impressed. An upcoming project I may take on had mentioned they're going to probably use this so I took another look. Holy crap! Not only have they made the barrier to entry almost zero, they have the simplest solution imaginable (given the current state of the art). Basically after signing up, you fork a github repo with the libraries pre-assembled (there aren't many), then start coding html and javascript... Once you're ready to deploy to devices, you push.. then log into their build server and hit "update"... a few minutes later you have a set of barcodes on your screen that allow you to deploy your application to blackberry, android, ios, webos, and symbian. I'm not sure how great the blackberry, webos, and symbian applications are, but the android seems to work very well and i'm going to test ios in the near future.
Using phonegap in addition to ripple mobile emulator in chrome makes me super productive and I can use the development tools of my choosing. For now, phonegap + ripple + whatever text editor are my tools of choice and I highly recommend trying them out. As for rhomobile, guys... I'm rooting for you, but it's just too complicated and I don't have enough spare time to figure it out. Titanium.. I think you have a market in the corporate enterprise space, but you're likely to not attract many indie guys as you just seem tooo... heavy.

Tuesday, September 20, 2011

What's the difference between macports and homebrew?

Anyone using a mac for software development has probably run into the need for some gnu/open source software that isn't pre-packaged. One of the great failings of Mac OSX is it's lack of a real package manager. Luckily, users stepped up and built some solutions: Fink, MacPorts, and HomeBrew.

I've never used fink, but I hear it's pretty good. Being also a debian/ubuntu guy, I'm familiar with apt-get so it's probably a decent tool... but having no direct experience with it I can't really comment.

This brings me to the two tools I HAVE used: Macports and HomeBrew. I started off with macports because it was the one that had the packages I was looking for. On advice from folks I was working with (I believe the comment was "why are you still using macports, everyone is using homebrew now"). I downloaded and started using HomeBrew, but frankly, I'm unimpressed.

As far as I can tell, the only reason anyone would use homebrew is if they stumbled across one of the web sites/blog posts with the raving fanboys flipping the bird to all the uneducated macports users. When digging around, I did find this fairly objective blog post which leads me to believe that homebrew is really...not that different. The biggest difference I see is that macports has almost everything I want, whereas homebrew is missing huge quantities of useful software, so I ended up requiring macports anyway.

I like the idea that homebrew will apparently use binary packages in some situations and frankly, I don't understand why that isn't the standard. After all, debian and redhat have been doing this for years and it's much more efficient that wasting your user's time recompiling for a tightly controlled platform. Anyway, my advice is macports works for me, homebrew also works, and they seem to both work together... so it really doesn't matter which one you use, but you'll still need macports anyway because homebrew is missing about 6000 packages that are already in macports.

Tuesday, September 13, 2011

Using Devise for authentication in rails 3

I recently started a new Rails 3 project and was going to use devise for authentication. While very powerful, the documentation was a touch confusing for me and all the other blog posts kept confusing me. What follows are my steps to get up and running with a minimum of effort and thinking.
Step 0, put devise into your Gemfile and run
bundle install
Next, Generate the devise install scripts
$ rails generate devise:install
Devise will spit out:
create  config/initializers/devise.rb
create  config/locales/devise.en.yml

===============================================================================

Some setup you must do manually if you haven't yet:

1. Setup default url options for your specific environment. Here is an
example of development environment:

config.action_mailer.default_url_options = { :host => 'localhost:3000' }

This is a required Rails configuration. In production it must be the
actual host of your application

2. Ensure you have defined root_url to *something* in your config/routes.rb.
For example:

  root :to => "home#index"

3. Ensure you have flash messages in app/views/layouts/application.html.erb.
For example:

<p class="notice">
<%= notice %></p>
<p class="alert">
<%= alert %></p>

===============================================================================
Gotcha number 1 is that you need TWO entires in routes if you want to enable auto sign up. In addition to what's described above, you'll also need the following line in your routes.rb if you want to enable this:
  devise_for :users
Next, generate the active record files for your User account (I used User, why use anything else?).
$ rails g active_record:devise User
Which spits out the following:
create  db/migrate/20110831002655_devise_create_users.rb
create  app/models/user.rb
invoke  test_unit
create    test/unit/user_test.rb
create    test/fixtures/users.yml
insert  app/models/user.rb


After that, migrate your databse
$ rake db:migrate
==  DeviseCreateUsers: migrating ==============================================
-- create_table(:users)
-> 0.0112s
-- add_index(:users, :email, {:unique=>true})
-> 0.0007s
-- add_index(:users, :reset_password_token, {:unique=>true})
-> 0.0006s
==  DeviseCreateUsers: migrated (0.0128s) =====================================
Generate a controller for your home page (optional if you've got another controller).
$ rails g controller home
Which spits out this:
create  app/controllers/home_controller.rb
invoke  erb
create    app/views/home
invoke  test_unit
create    test/functional/home_controller_test.rb
invoke  helper
create    app/helpers/home_helper.rb
invoke    test_unit
create      test/unit/helpers/home_helper_test.rb
remove app/public/index.html < this is important!
rm app/public/index.html
edit app/controllers/home_controller.rb and make it do something: NOTE: ;we've added the devise before_filter to require login/signup, this is what tells devise that the controller should be secured.
class HomeController < ApplicationController
  before_filter :authenticate_user!
  def index
    render :text => "Welcome #{current_user.email}!"
  end
Next generate the devise views. I don't think this is strictly necessary, but I'm assuming you'll want to customize these after you get going.

$ rails g devise:views
Output should look like this:
      invoke  Devise::Generators::SharedViewsGenerator
      create    app/views/devise/mailer
      create    app/views/devise/mailer/confirmation_instructions.html.erb
      create    app/views/devise/mailer/reset_password_instructions.html.erb
      create    app/views/devise/mailer/unlock_instructions.html.erb
      create    app/views/devise/shared
      create    app/views/devise/shared/_links.erb
      invoke  form_for
      create    app/views/devise/confirmations
      create    app/views/devise/confirmations/new.html.erb
      create    app/views/devise/passwords
      create    app/views/devise/passwords/edit.html.erb
      create    app/views/devise/passwords/new.html.erb
      create    app/views/devise/registrations
      create    app/views/devise/registrations/edit.html.erb
      create    app/views/devise/registrations/new.html.erb
      create    app/views/devise/sessions
      create    app/views/devise/sessions/new.html.erb
      create    app/views/devise/unlocks
      create    app/views/devise/unlocks/new.html.erb
Run your app and you will have auto-signup and authentication enabled. For more infomation about how to further customize your config see the devise web site.

Monday, September 12, 2011

defining IaaS, PaaS, and SaaS for cloud computing

Looking through cloud literature, it seems we've run out of three letter acronyms (TLA), so we're now using four letter acronyms (FLA?). Chief among these are the "as a service" acronyms which describe what level of stuff is handled by the provider. Wikipidia has some sort of explanation buried in the cloud computing page, but I thought I'd give the abridged version.

At the lowest level is "Infrastructure as a Service" (IaaS): You get a virtual server with connectivity to the internet. Examples are Amazon ECS and Rackspace Cloud. In these offerings, you can install almost anything you want (you typically get root access in some manner) and can even install the operating system of your choice. This is a good choice if you're a sysadmin who just doesn't want to muck around with physical hardware.

The next level up is "Platform as a Service" (PaaS): You get some sort of development platform hosted remotely and you have a mechanism to write and deploy applications that run on that remote platform. Often you cannot log onto the server directly and you are sandboxed into a virtual environment with pretty stringent restrictions about what you can do within it. Examples of this are Heroku for ruby and cloud bees run@cloud for java. This type of stack is good for a developer who needs to deploy an application, but doesn't know anything about the underlying technology.

At the highest level is "Software as a Service (SaaS): This is a hosted application that you simply use. Examples would be things like Lucid Chart or Gliffy. You just use this stuff and don't have any software installation or maintenance tasks.

A few providers blur the lines between IaaS and PaaS by offering hybrid solutions. An example of this is Engine Yard who provides what really is IaaS, but is largely centered around the ruby/rails platform. Amazon Elastic Beanstalk is an EC2 based java PaaS that also has more flexibily that something like run@cloud. I would say these last two are probably the most flexible PaaS solutions, but come with a bit more added complexity. They're probably the best for production deployments, but might be overkill for small applications with a limited user base.

Friday, September 9, 2011

Another Ruby 1.9.2 gotcha hashes are not arrays

In ruby 1.8.7, the following works

Michael-Mainguys-MacBook-Pro:~ michaelmainguy$ irb
ruby-1.8.7-p352 :001 > foo = {"foo","bar"}
=> {"foo"=>"bar"}
ruby-1.8.7-p352 :002 > foo["foo"]
=> "bar"
ruby-1.8.7-p352 :003 >


In 1.9.2, it fails rather curiously.

Michael-Mainguys-MacBook-Pro:~ michaelmainguy$ irb
ruby-1.9.2-p290 :001 > foo = {"foo","bar"}
SyntaxError: (irb):1: syntax error, unexpected ',', expecting tASSOC
foo = {"foo","bar"}
^
(irb):1: syntax error, unexpected '}', expecting $end
from /Users/michaelmainguy/.rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `
'
ruby-1.9.2-p290 :002 >


Matz/et-al, get it together... this is a pretty significant change and probably warrants a deprecation warning instead of a mysterious fail that in no way indicates what is really happening. This will seriously dampen adoption of the 1.9.x series of ruby for newbies who used the hash constructor the old (arguably incorrect) way.

Newbies, if you try to upgrade to ruby 1.9.2 and you get these strange error messages.... it means you where trying to create hashes by passing comma separated values... if you really wanted to create a hash the ruby way would be

foo = {"foo"=>"bar"}

or, for an array
foo = {"foo","bar"}

by doing {"foo","bar"}, ruby was helpfully (in 1.8.7) translating your call into
foo = Hash.new(["foo","bar"])... note, in the cases I'm finding this, the folks really wanted an array, but because it "kinda" worked, they didn't realize what had happened.

Maven central and rubygems.org

I was thinking about some of the differences between the ruby and java ecosystems and one comparison point I thought would be telling was how many packaged, open source, third party components are readily available. From my perspective, the current "way to get things" in java is maven, and the equivalent in ruby is a ruby gem. Conveniently, there is a central repository for both of these things, maven.org and rubygems.org.

It's interesting that they have almost an identical number of components: Maven has 29,815 and rubygems has 28,078. Of the top 10 downloaded from each, 1/2 of the maven artifacts are components to interact with maven repositories. Rounding out the maven list are: junit, commons-lang, and commons-collections. On the rubygems side, ALL of the top 10 are related to or used by rails.

I've heard folks say that ruby/java has more components available, but it just doesn't seem that way when looking at the numbers. I've yet to find a library that is NOT available in ruby/gems or java/maven.

Facebook/twitter integration? both have in spades...

library to simplify communicating over a serial port, it's in there...

Frankly, the rumors of one or other of these ecosystems have more open source component support are largely false. For MOST applications, I'm comfortable saying you cannot go wrong with either of these. To me, ruby seems to get more productivity from developers, but java isn't going anywhere soon.

I DO have a question though, is there something similar to maven and ruby gems for C#?

Tuesday, August 30, 2011

Lucid Charts and Gliffy for online diagramming

My one sentence evaluation:

If you MUST use Internet Exploder then Gliffy is your solution, otherwise download Chrome (or other html5 capable browser) and use use Lucid Charts.

The long Version:

As a long time Visio user, it used to be one of two microsoft tools I missed when working on linux/Mac. Beyond not really working on linux, it's just too damn expensive for the amount of time I spent using it. I'm an occasional user who needs to slap something together that is only slightly more professional than a napkin (although I've used napkins on occasion) or ascii art for illustrating network diagrams or software architectural components.

In the last 5 years or so, a couple of online tools have emerged that let you do this inside your web browser. I've used and been a paying customer of both Gliffy and Lucidcharts and will say they are both pretty good tools and if you're someone like me who like me, they can meet your needs.

Lucidcharts is a better put together product from a useability perspective, has more widgets and seems to be innovating much faster. For example, Lucidcharts has iphone templates, works on iPad, and can import visio. This last one is a killer feature that Gliffy has failed to implement for almost 4 years.. Lots of excuses, but no feature. Gliffy has abandoned their customers by not implementing this and regardless of how difficult it might have been to implement, personally, I think this feature alone will stand to propel lucid charts ahead of gliffy.

Gliffy uses flash and therefore will work on old browsers and IE, but you're likely out in the cold trying to use Gliffy on an ipad. All told, I'm very satisfied with Lucid charts and highly recommend it. Gliffy isn't a horrible product and I'm sure might find a niche in large corporations that don't use Visio already and are stuck using IE, but it just feels klunky in my hands and hasn't seemed to keep pace with what I need.

edit
I just noticed this post five best online diagramming tools and neither Gliffy nor Lucid Charts are listed...not sure why, maybe they aren't in the five best?


Tuesday, August 23, 2011

Push versus pull deployment models

There are two deployment models: push and pull. In a "Pull" deployment model, the individual servers contact a master server, download their configuration and software, and configure themselves. In a "Push" model, a master server pushes the configuration and software to the individual servers and runs commands remotely. While neither technique is right or wrong, they both have some specific advantages and disadvantages that you should understand when making a decision. In the ruby hosting world, an example of a "push" deployment is heroku, while Engine Yard is an example of a "pull" type deployment.

In the Pull model, each server has information about how to obtain it's configuration, when it boots (or whatever the configuration triggering event happens to be) it can continue to proceed without intervention from the master server. In the Push model, because the master server is orchestrating things, it will typically need to have a connection back every application server. This can lead to performance and scaleability problems when trying to deploy 100s or 1000s of application server images simultaneously.

On the other hand, the pull model doesn't typically have a way to ensure that all servers and software are launched in a particular sequence. In push scenario, the master server can coordinate which servers come up in which order. If you have a situation where application a depends on application b running, it might be a better idea to have a push deployment for these two.

In addition, the pull model starts to get wonky if you want to attempt "instantaneous" deployments. Then either the application servers need to poll frequently or "long poll" OR -- you need simulate a push deployment ;).

From a security perspective, since a pull deployment needs to connect back to a master server, it can potentially open up some security holes. In a push deployment, the master server has access to the application server, but the application server has no access back to the master. This means it is much less likely that a security breach on one of your application servers can corrupt your master server.

Both models work, and they are not mutually exclusive, but understanding their key differences is important.

Some other references:
Automated deployment systems: push vs. pull

Configuration management: push vs. pull

Thursday, August 18, 2011

Groupon makes me tired

I read a blog post a while back comparing groupon to a ponzi scheme, but I think it's more accurately Tulip Mania.

I discount the value Groupon can have for a business who is missing traffic to their store and struggling to figure out how a business gets a loyal customer from this tactic. As a business, it seems to me Groupon will only create disloyal customers and as a business, the best I can hope for from is a sort of ebay-like place to either dump excess inventory or fill in slow periods where I have trouble getting customers into my place of business.

When looking at the fundamentals, Groupon has only overhead. The only arguable asset is their consumer base and I think it provides no intrinsic value as these customers represent a pool of people who will buy something for 50% or more off. I hate to break the bad news, but those people aren't hard to find. Worse yet, the switching cost from groupon to "whatever else" is almost zero.

For Groupon to survive, they need to figure out how to make both their consumers and their business customers "sticky".

Tuesday, August 16, 2011

The overhead of annual enrollment in the US

For folks employed in "traditional" jobs, there is a common event in the modern age that raises collective blood pressure. It's know as "annual enrollment" and it is a period of time where most employees change/adjust various benefits offered by their company. In particular, medical insurance is a common thing to adjust.

While this is normal a cost of doing business, I think many folks underestimate the real costs of switching. Sure, there's the cost for the HR department to go out to 20 different brokers and try to get the "cheapest/best" plan for their employees, but there're also a number of hidden costs. For example, every year my health plan changes, I spend at least 5-10 hours futzing around with various billing changes as well as filling out forms etc.

When I quote 10 hours, many folks say "you're crazy, it only took me 15 minutes". I think these folks, greatly underestimate the amount of time spent because they're only worrying about the time to fill out the form. Because of these hidden costs, I imagine many employers scratch their heads trying to figure out why using a "cheaper/better" plan didn't show up on the bottom line.

Examples are:
Spending 15 minutes at the front desk of their physician/dentist/othodontist changing their information.
Reading about the plan and figure out what is and isn't covered.
Figuring out if their physician is "in-network" or not.

For dual income families:
Analyzing the differences between the two plans and figuring out which one is best.
Filling out "coordination of claims" paperwork (and heaven help someone who is divorced and needs to coordinate THAT nightmare).

All these costs erode the bottom line because they take time away from performing other duties. I'm not sure a national health plan is the solution for this, but the current system certainly certainly is a recurring drain on resources for every business except insurance companies.

Monday, August 15, 2011

Heroku is a bus, Engineyard is a car

Engineyard and heroku are two widely used ruby on rails hosting providers.

A common question is: Which one should I use?
The answer everyone gives: It depends!

Having deployed the same application in both environments, I thought I'd highlight some of the important differences.

#1 "Ease of Use"
Heroku blows engineyard away. you install the gem and can deploy your application in minutes. There are also commands you can run on your local machine to get information about your application.
Engineyard is moving forward, but it is still pretty technical. It's really easy if you have a public github repo, but anything other than that starts to get "more complicated" quickly.

#2 "Architecture"
Engineyard gives you a "real" virtual machine. This means you've actually got a single CPU virutal host that you ssh into and effectively do whatever you want.
Heroku gives you a sandbox with walls around it, and I think it's a shared environment. It's actually kinda difficult to figure out exactly what they're running as you cannot log onto the machine direction.

#3 "Startup Price"
Heroku gives you a free (as in beer) environment.
Engineyard let's you run a trial environment for free for a period of time, but you eventually have to pay for it... even if nobody ever visits your site.

#4 "Flexibility"
Heroku lets you do anything you want as long as they've preconfigured it to enable you to be able to do it.
Engineyard gives you ssh capability to the machine, which means you can do anything you want even if they didn't think it would be a good idea.

Overall, I'd say Heroku is like taking the bus: if enough people want to go the same place at the same time, it's more economical. Engineyard is like buying a car: it's going to be a bit more expensive and you're going to need to know how to drive, but it is a much more flexible solution.

Friday, August 12, 2011

The economics of scaling software development teams

One common theme in the development world is the vast difference in productivity between individual practitioners. Among folks with similar backgrounds, some are easily 10x more productive at delivering functional software. Software development is obviously not alone in having to deal with this, but I there are some particular attributes of software development that make dealing with this mismatch particularly difficult.

The first problem is that many people seem to have the mistaken impression that if one has developers who produce output, then the output is simply the sum of the individual outputs. This is just not true by any reasonable measure and is a contributing factor to the reason many outsourcing projects end up a smoldering mess. Mathematically, each developer adds a cumulative amount of drag if the software they are writing needs to communicate with software other folks on the team are writing.

As an example, suppose we have 1 developer working on a project, and he has a perfect understanding of the requirements. The overhead for this person to write the software is the cost of typing it into a computer, compiling (if necessary), and deploying it. Obviously the frequency and cost of compiling and deploying are a factor, but this is typically a fixed cost.

Now let's add another developer who is going to work on the same software as the first. We now incur the overhead of the first developer communicating what he is doing to the second (and visa versa) PLUS the overhead of bringing the new developer up to speed on who things work. While bringing the new developer up to speed is a one time cost, the communication overhead is ongoing. Generally speaking, the communication overhead can be represented as a complete graph with a formula of cost=(count*(count-1)) /2.

What does this mean? Well, for starters, compared to a team of two, the communication overhead for a team of 10 is 45x higher and a team of 50 is 1225x higher! More importantly, this is assuming unrealistic perfect communication, so the real factor can be much larger depending on the effectiveness of communication. As an example, using high latency communication or off cycle team mates can often double the cost of each network interconnect.

But wait! we know there are software projects with tens, and even hundreds of developers, how can they possibly be successful?

For starters, many teams simply absorb this cost and they largely end up much less successful than they COULD be. Any project manager or even developer is probably familiar with the sad problem of having an initial spike of productivity with a small team, then looking on in dismay when an influx of new people to the project drops their delivery velocity.

A better approach approach is to reduce the amount of communication necessary between subgroups within the team. This can be done by choosing a software architecture that has well defined boundaries and focusing the communication between the independent teams on those boundaries. For example, many places do this by having a "backend team" that writes database access code and a "frontend team" that writes the GUI.

So in our example of our team of 10, if we broke it into two teams of 5 who intercommunicated via one interface, our communication overhead would be reduced to 21 from 45.... that cut it down by more than half!

What does this mean? It means by following these guidelines, project managers and architects now have another tool to account (literally) for a variety of team configurations. Information technology architecture is not JUST about software and hardware, but also wetware (people) and human factors need to be accounted for as part of the art.


Thursday, August 11, 2011

Ruby 1.9.2 changes and i18n on Mac OSX

We recently noticed some pretty interesting changes in ruby 1.9.2. It appears that require no longer allows relative paths and ruby is now more unicody. This means if you are in a directory with two files "foo.rb" and "bar.rb", you can no longer simply type "require 'foo'" inside bar.rb to use foo. Now, you need to either do "require './foo'" or "require_relative 'foo'".

A potentially more difficult change is in how ruby handles character encodings. For the most part, this isn't a problem inside "normal" code and strings, but things get dicey if you start reading text files off a filesystem. This is especially dicey if you're doing this and you're on a mac AND you work with western european data AND it involves money. If you save a file with a currency symbol on a mac, then subsequently read the file on a machine (or a tool) that uses/assumes utf-8, you will not see €, you will see a Û.

To cut to the chase, if you're developing software on a Mac, make sure you change your tools to use utf-8, NOT macroman or you will at some point be scratching your head. Why? As a quick example, the Euro symbol in macroman is mapped to a different character than it is in UTF-8. More importantly, for international applications, non-latin characters don't exist and you won't be able to properly edit files with asian and other non-latin based characters.


Wednesday, August 10, 2011

git and github not change mangement tools

I recently stumbled across a problem with git that is going to cause no manner of headache for the uninitiated. Git repositories are fundamentally insecure and the audit trail is dodgy when folks are either #1 intentionally malicious or #2 ignorant to how git works.

For the back story, I have a number of github accounts, one is used for code I use while blogging, another is for internal projects at work, and yet another was for a client I was working on. A while back I noticed that I had commits that apparently were done with my "client" github account that showed up in my blog.

Confused, I verified my public/private key pairs against what was in github and was truly stumped as to how this was happening. While scratching my head, I remembered that there is the concept of a "global" config in git and ran the following command:

git config --global -l

Uh oh! it turns out I had a global config set... When I went back and looked, EVERY commit I had done was as this erroneous user.

The problem seems common to distributed source code revision management systems, but is fundamental to git. Every copy of every repository is trusted to maintain it's own copy of revision history. After realizing this, I also realized that I could rewrite history, push to the central repository and effectively delete and/or amend the revision history in the master repository.


As an example, I tried the following:
git clone git@github-personal:mikemainguy/mainguy_blog.git
git config --replace-all user.name Scooby
git config --replace-all user.email sdoo@doo.com
echo "Scooby Doo" >> README
git commit -a 
git push

Now when examining the history in github, my commit shows up as having come from some cat named scooby doo! Worse yet, there's no apparent way to figure out which github account actually pushed the change.

Worse yet, I can rewrite history in my local repository, push it out and make old changes disappear and nobody will be able to see what happened.

For example:
echo "SECRET STUFF" >> README2
git add README2
git commit README2 -m "whoops"
git push
Check your central repo, you see README2

git filter-branch --index-filter 'git rm --cached --ignore-unmatch README2' -f
git push

Now check github, the commit is still there (I'm not sure if that can be completely removed), but the file is gone... It's as if it never existed!

So, applying this knowledge to my screwed up commits I ran the following:
git filter-branch --env-filter 'export GIT_AUTHOR_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_COMMITTER_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=mike.mainguy@gmail.com' -f
git filter-branch --env-filter 'export GIT_COMMITTER_EMAIL=mike.mainguy@gmail.com' -f
And all the craziness is gone...

Knowing these details is important if you're using git/github because most people coming from a centralized source code control tool would find this behavior a little bit disconcerting (if not just plain wrong). The thing to remember is that every person you trust to push to your repository can effectively remove/rewrite history to their own liking. If accountability and audit trail are important, you'll likely need to adopt a "pull" model and have someone manually verify/rewrite each commit.

Tuesday, August 9, 2011

Avoid Bowfinger Syndrome

I just read a great blog post about what it costs to write AAA games and the author used a term I love called "Bowfinger Syndrome". Bowfinger is a movie starring Steve Martin and Eddie Murphy in which Bobby Bowfinger (Martin) tries to make a movie with only $2000. While a very funny movie, it hits home in the software development world in many ways.

Too often, software projects fail because folks grossly underestimate the costs associated and then spend time trying to work around the lack of budget to actually finish the project. There are a number of reasons for this and I'll give a quick list:

#1 A prototype of some software was written and someone extrapolates the costs to build the "real software". This is a mistake, prototypes are like hollywood cutout towns... they may LOOK like real software, but they don't necessarily WORK like real software. If you're eyeballing the hours necessary to build a prototype and trying to estimate effort to build the finished product, step back and imaging how you would do this if you were trying to convert a hollywood set into a real city.




#2 The costs to write an initial version of some software are buried and not really understood. This happens when domain experts or highly skilled developers write initial versions of software and then folks think that they can economize by using unskilled or domain ignorant developers to write something else or a new version of the same software. Put another way, if 1 surgeon can do 1 surgery in 1 hour, that doesn't mean 5 trained monkeys can do 5x more surgeries in the same amount of time.



#3 A contract does not mean the software can be delivered. If your business depends on software and your plan is to somehow get a lawyer to deliver it, you're not going to have a business for long. Put another way, just because someone is foolish enough to sign a contract and agree to build you a ladder to the moon, doesn't mean it's a good idea to start pre-selling tickets.



Photos:
utahstories.com
times3online.com
www.web20lawyer.com





Monday, August 8, 2011

javascript sleep method

For newcomers to javascript, it might come as a surprise that there is no sleep method. Worse yet, if you search the internet, you'll find all manner of really... really bad ways to simulate this. One of my favorite "rotten tomatoes" is something like this:
alert('start');
  var date = new Date();
  var curDate = null;
  do { curDate = new Date(); }
  while(curDate-date < 5000);
  alert('finish');

Note, I borrowed this horrible example from stack overflow. If you're lucky, that example will not completely crash your browser. A much better solution is something like this:
alert('start');
  setTimeout(function() {alert('finish')},5000);

The obvious problem is that if you truly want to simply pause for 5 seconds in the middle of a really long method, the anonymous function is not going to help you out very much.. unless you do something like this
alert('start');
  //lots of code
  var a = 'foo';
  setTimeout(function(){
    alert(a);
    //lots more code
  },5000);
If you find this solution lacking, you're probably due for some refactoring of your code as you are probably writing highly procedural code and it's likely that javascript is going to cause you other, more serious problems. This is on stack overflow here, upvote if you think it's a good solution.

Friday, August 5, 2011

Adding methods at runtime in javascript, ruby, and java

OK, the title is a ruse, you can't easily add methods to java classes at runtime. But I'll illustrate how to do it in javascript and ruby. For example, lets supposed we have a javascript object(function) and we want to add a say_hello method to it:

var myObj = {};
myObj.say_hello() // doesn't work
myObj.say_hello = function() {
  return "hello";
}
myObj.say_hello(); //works


This is because javascript treats functions as first class citizens and doesn't even bother with the concept of "classes" as something other than special functions.

Same thing in ruby:
myObj = Object.new
myObj.say_hello # doesn't work
def myObj.say_hello
  "hello"
end
myObj.say_hello # works

There's a subtle difference here. The ruby syntax seems a little strange to me and it wasn't obvious how to do this. In javascript, it's very obvious that you're assigning a new function to the attribute (that you're adding). In ruby, using def in this manner seems out of place...

Ruby seems to attribute special meaning to the class and object definition where javascript treats methods (functions) just like any other variable.

Digging around a little, the ruby situation gets a little strange:

myObj2 = Object.new;
myObj2.say_hello @ doesn't work (this makes sense because we only defined the method on one instance
Object.respond_to? :say_hello #false ??? I "kinda" get it
myObj.respond_to? :say_hello #nil  ??? OK, now I'm pretty confused


If I want, I COULD have added the method to the class in ruby and then every instance would get it e.g.
myObj = Object.new()
myObj.say_hello # doesn't work
class Object
  def say_hello
    "hello"
  end
end
myObj.say_hello # works
myObj2 = Object.new()
myObj2.say_hello #works
MyThing.respond_to? :say_hello #works


It really depends on if you want to "backport" new methods to all objects of a class or only add the method to a particular instance. To do the equivalent in javascript, you'd do something like
var myObj = {};
myObj.say_hello() // doesn't work
Object.prototype.say_hello = function() {
  return "hello";
}
myObj.say_hello(); //works
var myObj2 = {};
myObj2.say_hello() //works
(new Array()).say_hello() //works!


The difference between how the two languages accomplish this is pretty minor, the more important thing to understand is the difference between adding a method to and instance (object) or a class definition(prototype). Not having a complete understanding of these differences can cause a lot of problems and subtle bugs in your code.

Thursday, August 4, 2011

Object Oriented Javascript

Suppose we need to create a counter that starts at zero, increments to 4 and then restarts at 0 (A "you can't count to five" counter). There many ways to do this, but a straightforward way someone might do this in javascript would be:
var global_count = 0;
function increment() {
    global_count ++;
    if (global_count > 4) {
        global_count = 0;
    }
}

This is pretty common for beginners and works until your systems start to get complicated and your global_count gets clobbered or multiple people need to have counters. For multiple counters, one might start with a naming convention and add variables like "global_count2, global_count3" and et cetera. This doesn't solve the problem of the variables getting clobbered in other parts of your system and it likely makes it worse because now there are more permutations to mix up and accidentally change the value of.

As an example of what I'm talking about
var global_count = 0;
function increment(my_val) {
    my_val++;
    if (my_val > 4) {
        my_val = 0;
    }
}

increment(global_count); //works fine

//Somewhere else a smart guy decides to do this
global_count = 52;


To prevent this sort of thing OO has the concept of information hiding. In java, a straightforward way to avoid someone clobbering your count is to encapsulate the "count" variable and only access it via a method that performs your business logic.
public class BeanCounter() {
  private int count = 0;

  public void increment() {

    count++;
    if (count > 4) {
       count = 0;
    }
  }
  public int getCount() {
    return count;
  }
}

So now, when this java example is literally (for the most part) translated to javascript, it looks like this:
function BeanCounter() {
    var count = 0;

    this.increment = function() {
        count++;
        if (count > 4) {
           count = 0;
        }
    }

    this.getCount = function() {
        return count;
    }

}
var my_counter =new BeanCounter();

But a more javascripty way to do this would probably be:
var bean_counter = function () {
    var that = {};
    var count = 0;
    that.increment = function () {
        count++;
        if (count > 4) {
           count = 0;
        }
    };
    that.getCount = function () {
        return count;
    };
    return that;
};
var my_count = bean_counter();
my_count.increment();
my_count.getCount();

In both these examples, we can see some key differences between javascript and java. First, there really isn't anything formally defining a "class". We simply have functions with different mechanisms to extend and enhance them. In addition, there's more than one way to actually implement something that could fulfill some of capabilities that a more traditional OO language (like java) might give us. In this case the thing we wanted was the ability to ensure we could have things that would only be able to count to 4, then reset back to 0.

There are some advantages to the second approach when it comes to inheritance and polymorphic behavior that I won't go into here, but for a more in-depth look at different ways to apply (and mis-apply) OO concepts in javascript, I highly recommend Javascript, The Good Parts by Douglas Crockford.

Wednesday, August 3, 2011

Git for dummies

OK, the title of this post is a lie... git is decidedly NOT for dummies. Git is more for your really smart mentally gifted people OR for people who are working on exceeding complex software projects that have complicate merging and revision history requirements. Neither of these groups should be dummies or you will have a serious problem being effective. For the context of this tutorial, a "dummy" is defined as somebody who knows and understands how to use SVN or CVS in a team environment.

So if you're still reading, I will walk you through the simplest workflow I can discover for git that work and doesn't cause too many complications. For the sake of simplicity, we're going to assume you're working on a project hosted at github that already exists. I've created a public repo if you'd like to try along at home. Assuming you already have git installed, you should be able to "clone" the repository to your local machine by issuing the following command:

git clone git://github.com/mikemainguy/mainguy_blog.git

edit: I used the wrong url originally.
At this point you should now have a subdirectory called mainguy_blog with a "README" file inside it.

Assuming that everybody is working on a single branch of development, the workflow is pretty simple.
  • edit files (vi README)
  • add files to staging area (git add README)
  • commit changes (git commit README)
  • pull changes from remote (git pull)
  • push changes to remote (git push)

One thing you will note is that the number of steps is a bit different than you might be used to with svn or cvs. In particular, git seems to have added some steps. With SVN, the workflow would typically be:
  • edit files (vi README)
  • update from remote (svn update)
  • commit changes to remote (svn commit)

With Git, we've added the notion of a local staging area AND a local repository. This will really confuse dummies like me at first and I cannot emphasize enough that you need to think about the implications of this. I guarantee you THINK you get it, but the practical implication of not grokking it are that you will likely do tremendously stupid things for a period of days once you get into a larger team and/or someone starts to try doing some fancy merging.

So now, we're going to walk through a "normal" multiuser scenario.
  1. User 1 edits README and adds it, and commits to their local repository
  2. User 2 edits the same file in a different place, adds it and commits to their local repository
  3. User 1 pushes their change to github and the change looks like this
  4. User 2 tries to push their changes to github, but they discover that user1 has already pushed their changes.
  5. User 2 pulls their changes from github
  6. Git automerges the file because there are no conflicts
  7. User 2 pushes their changes to github
When we look at the github history we something interesting... there is an additional commit that was added at the end to indicate git/User2 merged some files. Aside from the extra workflow steps, this is an additional point of confusion for quite a few newcomers.

In short, a workflow to have git work like perhaps a dummy like me would expect follows:

vi README
git add README
git commit README
git pull --rebase


Now here is where things can get tricky. In the SVN world, if you have merge conflicts, you fix them and move along committing the results when you've "fixed" things. With git, on the other hand, you need "add" fixed files back in and continue the rebase. So, if you have no conflicts, you're actually done at this point, but if you have a merge conflict, you need to do the following steps


vi README
git add README
git rebase --continue


Once this is finished, push your changes back to the remote repo


git push


additional warning

When rebasing, if you get a conflict, do NOT commit the files, only ADD them. If you commit the files you will condemn yourself to a hurtful place where your commit history shows conflicts with things that you didn't even edit.

I think git is a wonderful tool, but it has a much steeper learning curve than it's simpler and kinder cousins svn and cvs. While my perspective is skewed by years of using SVN and CVS, I think it is pretty safe to say that these tools have millions of users and I am not the only person to go through the pain of "figuring out" git. The addition of remote/local and a staging area seems to be a common point of confusion for newcomers who've arrived at git from the SVN/CVS world.