Wednesday, August 10, 2011

git and github not change mangement tools

I recently stumbled across a problem with git that is going to cause no manner of headache for the uninitiated. Git repositories are fundamentally insecure and the audit trail is dodgy when folks are either #1 intentionally malicious or #2 ignorant to how git works.

For the back story, I have a number of github accounts, one is used for code I use while blogging, another is for internal projects at work, and yet another was for a client I was working on. A while back I noticed that I had commits that apparently were done with my "client" github account that showed up in my blog.

Confused, I verified my public/private key pairs against what was in github and was truly stumped as to how this was happening. While scratching my head, I remembered that there is the concept of a "global" config in git and ran the following command:

git config --global -l

Uh oh! it turns out I had a global config set... When I went back and looked, EVERY commit I had done was as this erroneous user.

The problem seems common to distributed source code revision management systems, but is fundamental to git. Every copy of every repository is trusted to maintain it's own copy of revision history. After realizing this, I also realized that I could rewrite history, push to the central repository and effectively delete and/or amend the revision history in the master repository.


As an example, I tried the following:
git clone git@github-personal:mikemainguy/mainguy_blog.git
git config --replace-all user.name Scooby
git config --replace-all user.email sdoo@doo.com
echo "Scooby Doo" >> README
git commit -a 
git push

Now when examining the history in github, my commit shows up as having come from some cat named scooby doo! Worse yet, there's no apparent way to figure out which github account actually pushed the change.

Worse yet, I can rewrite history in my local repository, push it out and make old changes disappear and nobody will be able to see what happened.

For example:
echo "SECRET STUFF" >> README2
git add README2
git commit README2 -m "whoops"
git push
Check your central repo, you see README2

git filter-branch --index-filter 'git rm --cached --ignore-unmatch README2' -f
git push

Now check github, the commit is still there (I'm not sure if that can be completely removed), but the file is gone... It's as if it never existed!

So, applying this knowledge to my screwed up commits I ran the following:
git filter-branch --env-filter 'export GIT_AUTHOR_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_COMMITTER_NAME=Mike Mainguy' -f
git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=mike.mainguy@gmail.com' -f
git filter-branch --env-filter 'export GIT_COMMITTER_EMAIL=mike.mainguy@gmail.com' -f
And all the craziness is gone...

Knowing these details is important if you're using git/github because most people coming from a centralized source code control tool would find this behavior a little bit disconcerting (if not just plain wrong). The thing to remember is that every person you trust to push to your repository can effectively remove/rewrite history to their own liking. If accountability and audit trail are important, you'll likely need to adopt a "pull" model and have someone manually verify/rewrite each commit.

12 comments:

me said...

Actually, I guess they ARE change management tools, they just need to be used in a different manner than one might use a centralized source code repository.

Andos said...

“nobody will be able to see what happened”

Anybody that ever pulled from your public repository will notice the change.

SaltwaterC said...

Cat? Scooby Doo is a dog!

Ted said...

My comment from http://news.ycombinator.com/item?id=2867948:

You can ammend history in any repository, but all other repositories still have the original state. Hence, someone who tries to tamper with git history in such a way can't do it undetectably. It will be blatently obvious to all that the history has been rewritten, and all other copies of the repositories will contain the original history.

If you are going to use git in a centralized way, I suggest you use Gerrit. In addition to providing code review functionality, Gerrit also gives user authentication and you per-user access controls. This allows you to restrict what a user can do when he pushes, so that he can only update a branch (i.e., push new content), and not delete a branch or do a "force push" (which is what you would need to do if you want to replace a branch with entirely new content).

It's also possible to customize Gerrit to only allow a user to push changes that he or she wrote herself, which will give you a much more strict audit trail. And you can set these access control parameters on a per-branch basis, so you could allow the release manager to push new changes onto the vendor branch, but all changes to the production branch must be committed by the person submitting the change, and go through code review.

So the basic take-away from the article is (a) git is a distributed SCM, not a centralized SCM; and if you want to use git in a centralized SCM fashion, don't do it incompetently --- instead you should use Gerrit, which is designed as a wrapper to Git so it can a secure, auditable, centralized repository.

Ted said...

One addendum: if you push changes to the vendor branch, they don't end up on the production branch (if it requires code review) until the merge commit is reviewed by a second developer.

You still have the problem that the 2nd developer may not do a good job reviewing all of the new code brought in by the merge commit, but that's a problem with code reviews in general, and its a social problem, not a technical one....

me said...

Thanks for the tip about gerrit, I need to check it out.

One clarification, my point about "nobody will be able to see what happened" is that once things are in the repo and you've done subsequent pulls, it's difficult to tell exactly where the commit came from. You need to review every commit as it comes in and if you wait 2 weeks and want to go investigate, you're out of luck (AFAIK).

Douglas Cuthbertson said...

You might be interested in Fossil SCM (http://fossil-scm.org/). It doesn't allow the history to be changed and it's a nice self-contained repository.

Ted said...

Well of course it's difficult to see where the commit came from; git isn't a centralized SCM, by design. The fault was in trying to use it that way, and letting anyone random push stuff into a repository without human review.

Gerrit is designed to allow a centralized git repository to have access controls and to add that human review.

Anonymous said...

You know what's insecure? I could login to my company's main server and do `rm -rf ~/Projects`!

If you "trust" someone with commit rights, that means something. If you don't trust, use pull requests.

Anonymous said...

git filter-branch and other tools you mention are essential for many uses, for example when you accidentally push your ssh private key or database configuration to GitHub. Rewriting history also useful in many other ways, e.g. rebasing.

If you want security with Git, it's done with GPG's cryptographic signatures. Git integrates closely with GPG and that's the way it's designed to work.

Anonymous said...

Nice post.

What I have realized the problem with tools like Git etc is the community of people with blind faith and inspite of all effort to highlight a issue, not able to accept it but rely on the founding fathers to do show the way.

everybody around me seems to chatter Git, Git, but nobody truly understands it features and deficiencies.

me said...

I thought I'd add a link to a GREAT article about git.
git is simpler than you think