git and github not change mangement tools
I recently stumbled across a problem with git that is going to cause no manner of headache for the uninitiated. Git repositories are fundamentally insecure and the audit trail is dodgy when folks are either #1 intentionally malicious or #2 ignorant to how git works.
For the back story, I have a number of github accounts, one is used for code I use while blogging, another is for internal projects at work, and yet another was for a client I was working on. A while back I noticed that I had commits that apparently were done with my "client" github account that showed up in my blog.
Confused, I verified my public/private key pairs against what was in github and was truly stumped as to how this was happening. While scratching my head, I remembered that there is the concept of a "global" config in git and ran the following command:
Uh oh! it turns out I had a global config set... When I went back and looked, EVERY commit I had done was as this erroneous user.
The problem seems common to distributed source code revision management systems, but is fundamental to git. Every copy of every repository is trusted to maintain it's own copy of revision history. After realizing this, I also realized that I could rewrite history, push to the central repository and effectively delete and/or amend the revision history in the master repository.
As an example, I tried the following:
Now when examining the history in github, my commit shows up as having come from some cat named scooby doo! Worse yet, there's no apparent way to figure out which github account actually pushed the change.
Worse yet, I can rewrite history in my local repository, push it out and make old changes disappear and nobody will be able to see what happened.
For example:
Now check github, the commit is still there (I'm not sure if that can be completely removed), but the file is gone... It's as if it never existed!
So, applying this knowledge to my screwed up commits I ran the following:
Knowing these details is important if you're using git/github because most people coming from a centralized source code control tool would find this behavior a little bit disconcerting (if not just plain wrong). The thing to remember is that every person you trust to push to your repository can effectively remove/rewrite history to their own liking. If accountability and audit trail are important, you'll likely need to adopt a "pull" model and have someone manually verify/rewrite each commit.
For the back story, I have a number of github accounts, one is used for code I use while blogging, another is for internal projects at work, and yet another was for a client I was working on. A while back I noticed that I had commits that apparently were done with my "client" github account that showed up in my blog.
Confused, I verified my public/private key pairs against what was in github and was truly stumped as to how this was happening. While scratching my head, I remembered that there is the concept of a "global" config in git and ran the following command:
git config --global -l
Uh oh! it turns out I had a global config set... When I went back and looked, EVERY commit I had done was as this erroneous user.
The problem seems common to distributed source code revision management systems, but is fundamental to git. Every copy of every repository is trusted to maintain it's own copy of revision history. After realizing this, I also realized that I could rewrite history, push to the central repository and effectively delete and/or amend the revision history in the master repository.
As an example, I tried the following:
git clone git@github-personal:mikemainguy/mainguy_blog.git git config --replace-all user.name Scooby git config --replace-all user.email sdoo@doo.com echo "Scooby Doo" >> README git commit -a git push
Now when examining the history in github, my commit shows up as having come from some cat named scooby doo! Worse yet, there's no apparent way to figure out which github account actually pushed the change.
Worse yet, I can rewrite history in my local repository, push it out and make old changes disappear and nobody will be able to see what happened.
For example:
echo "SECRET STUFF" >> README2 git add README2 git commit README2 -m "whoops" git pushCheck your central repo, you see README2
git filter-branch --index-filter 'git rm --cached --ignore-unmatch README2' -f git push
Now check github, the commit is still there (I'm not sure if that can be completely removed), but the file is gone... It's as if it never existed!
So, applying this knowledge to my screwed up commits I ran the following:
git filter-branch --env-filter 'export GIT_AUTHOR_NAME=Mike Mainguy' -f git filter-branch --env-filter 'export GIT_COMMITTER_NAME=Mike Mainguy' -f git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=mike.mainguy@gmail.com' -f git filter-branch --env-filter 'export GIT_COMMITTER_EMAIL=mike.mainguy@gmail.com' -fAnd all the craziness is gone...
Knowing these details is important if you're using git/github because most people coming from a centralized source code control tool would find this behavior a little bit disconcerting (if not just plain wrong). The thing to remember is that every person you trust to push to your repository can effectively remove/rewrite history to their own liking. If accountability and audit trail are important, you'll likely need to adopt a "pull" model and have someone manually verify/rewrite each commit.
Comments
Anybody that ever pulled from your public repository will notice the change.
You can ammend history in any repository, but all other repositories still have the original state. Hence, someone who tries to tamper with git history in such a way can't do it undetectably. It will be blatently obvious to all that the history has been rewritten, and all other copies of the repositories will contain the original history.
If you are going to use git in a centralized way, I suggest you use Gerrit. In addition to providing code review functionality, Gerrit also gives user authentication and you per-user access controls. This allows you to restrict what a user can do when he pushes, so that he can only update a branch (i.e., push new content), and not delete a branch or do a "force push" (which is what you would need to do if you want to replace a branch with entirely new content).
It's also possible to customize Gerrit to only allow a user to push changes that he or she wrote herself, which will give you a much more strict audit trail. And you can set these access control parameters on a per-branch basis, so you could allow the release manager to push new changes onto the vendor branch, but all changes to the production branch must be committed by the person submitting the change, and go through code review.
So the basic take-away from the article is (a) git is a distributed SCM, not a centralized SCM; and if you want to use git in a centralized SCM fashion, don't do it incompetently --- instead you should use Gerrit, which is designed as a wrapper to Git so it can a secure, auditable, centralized repository.
You still have the problem that the 2nd developer may not do a good job reviewing all of the new code brought in by the merge commit, but that's a problem with code reviews in general, and its a social problem, not a technical one....
One clarification, my point about "nobody will be able to see what happened" is that once things are in the repo and you've done subsequent pulls, it's difficult to tell exactly where the commit came from. You need to review every commit as it comes in and if you wait 2 weeks and want to go investigate, you're out of luck (AFAIK).
Gerrit is designed to allow a centralized git repository to have access controls and to add that human review.
If you "trust" someone with commit rights, that means something. If you don't trust, use pull requests.
If you want security with Git, it's done with GPG's cryptographic signatures. Git integrates closely with GPG and that's the way it's designed to work.
What I have realized the problem with tools like Git etc is the community of people with blind faith and inspite of all effort to highlight a issue, not able to accept it but rely on the founding fathers to do show the way.
everybody around me seems to chatter Git, Git, but nobody truly understands it features and deficiencies.
git is simpler than you think