Friday, September 23, 2011

Using git in an agile environment

Git and most of the workflows I've seen on the internet are not designed (or well thought through) for most agile development.  By agile development I mean a small (1-10) team of professional developers all working together in short cycles (1-3 weeks) delivering functional software.

Why is git not designed for agile?  Because git, or more importantly, the workflows most readily apparent on the internet all focus on the idea of isolating changes or playing around with branches instead of getting stuff done.  This is great if you have a benevolent master who's sole job is to integrate disparate changes into a common code base (you know, someone like Linux Torvalds), but awful if you have a team of 10 people hacking away on a common codebase.  Aside from some very epic fails in the cvs and svn designs, git by itself doesn't really help in this situation.  I'd dare to say that git actually gets in the way 3x more than it helps.

To make matters worse, if you peruse the internet, you'll find a plugin called gitflow that canwill really make things bad.  It adds yet another layer of indirection and overhead that is totally unnecessary on a small team of professional developers working together in short cycles to deliver functional software (you'll notice that I've said the same thing again here).  Gitflow is great if you have lots of time to merge and integrate and otherwise play around with your tools, but not so great if you're on a small team of professional developers trying to work together in short cycles to deliver functional software... because it's just overhead.

I think I've made my first point about the type of team and type of goals that the  normal usage of git will cause problems with.  If you're not in that group and/or you like playing around with git, and/or you like sitting down at night with a copy of "pro git" and spending hours analyzing gitk diagrams... read the rest of this posting with caution because it's probably inflammatory and wrong in your world view.

First off... If you're working together writing a rails/struts/springmvc application, there are common files that everyone is going to be editing.  That's just the way it works, if you try to architect your solution by carving your application up into 10 pieces and give a piece to every developer, you're probably making a very big mistake.  The fact is, a version control tool is about SHARING CODE and software development teams are supposed to WORK TOGETHER, not in isolation.  Please repeat that sentence until you git it (hahah, I crack me up sometimes)...

Anyway, for MANY projects, especially small agile projects... branching should be a bad thing.  Every branch incurs the overhead of both keeping it up to date, but  also resolving conflicts as well as reintegrating (and testing) changes.  These costs are n^n as the team gets larger (actually, it might be n^n-1, maybe a math guy could help me out here?) and your objective should be to minimize them because they don't add value.

The simplest way to minimize these costs?  Have everyone integrate with a central repository (like github) and resolve incoming conflicts/test their code before pushing back to central.  If you follow that workflow, you're essentially using the central svn/cvs style coding on top of a decentralized tool, and there's NOTHING wrong with that.

So, how do you do it?  First, if you're using github, DON'T FORK, clone the repo to your local machine.  Once you've done that, follow this workflow (more  detail can be found here):

  1. write software and test it
  2. git commit -a -m 'My work here is done'
  3. git pull --rebase
  4. test and/or resolve conflicts
  5. git push origin 
  6. repeat

That's it... don't let the DVCS gods make you feel ashamed, just do it and all will be well.  There are certainly a few wrinkles that might happen in this workflow... notably, if #1 while you're fixing your rebase, someone else does a push, or #2 someone doesn't use rebase and is creating branches and merge commits.

#1 is easily fixable by perusing the interweb
#2 is a bit more tricky, but also fixable by perusing the interweb

In short, don't create branches on top of branches and screw around merging all over the place, rebase is your friend, use him wisely and all will be well.  Another note I will add is-- NEVER  do a "git push -f" (force push) in this mode.  You will screw with everybody on the team's head and they will likely need to start drinking early in the morning to figure out what is going on.


gelisam said...

Math guy here: it's actually n*(n-1), not n^(n-1).

But that's assuming all n branches pull from the (n-1) other branches. From what I can read from their website, this is exactly the opposite of the gitflow branching model. Like you, they use a centralized sharing style on top of a decentralized tool.

I think feature branches might be useful in the case where a feature is so big that you can't fit it all in a single commit; for example, if the feature is so big that you need several developers to collaborate on it. If short cycles mean you will never work on such a big feature, then I think you can safely continue to ignore branches.

Mike Mainguy said...

Thank god for math guys... I went back to try and figure out where I got my O from and I'm thinking the problem is a bit more convoluted that I originally expected.

For n=1, the answer with gitflow is 2
for n =2, the answer is 6
for n = 3, the answer is 11

When I draw this out on a notecard with all the possible combinations of sharing, these are the numbers I get.

Am I making sense?

Mike Mainguy said...

BTW, the problem is, if you don't pull from all complimentary branches, you get the wildcard of "what is going to totally hork my world" ... this is a fat tail/wild swan problem that totally horks many projects.

gelisam said...

Hmm. I don't think we are counting the same thing. If I was drawing on notecards, they would contain fully connected graphs with n vertices, and I am counting the number of undirected edges. What is your n, and what are you drawing?

By the way, if all of your branches are synchronized with each other, then they all have the same contents, and there is no reason to use branches at all. Might as well, as your post suggests, commit everything to the mainline.