Tuesday, June 12, 2012

Source Control

I was just asked about source control by a friend that has only had experience in VSS. Thought I would share my rant:

Git (and other DVCSs) are a revolution in source control. They really are. Honest. I'll come back to why later but lets concentrate on what you know, VSS.

VSS has a very bad reputation nowadays. This is primarly because vss is known for corrupting its repository (http://www.codinghorror.com/blog/2006/08/source-control-anything-but-sourcesafe.html) but also because of its concurrency strategy, lock-unlock-modify
(http://svnbook.red-bean.com/en/1.7/svn.basic.version-control-basics.html#svn.basic.vsn-models.lock-unlock) which has gone out of fashion. Most VCSs (e.g. svc, cvs) use copy-modify-merge
(http://svnbook.red-bean.com/en/1.7/svn.basic.version-control-basics.html#svn.basic.vsn-models.copy-merge) because this allows more developers to work on the same bit of code at the same time. This strategy is great but the risk is that you get problems when you merge. Obviously you get conflicts when you are merging in code into a different version of a file of the code you;ve checked out but some people keep whole versions of the codebase unmerged for extended periods of time. These can be versioned as well and are know as branches.

The rational behind branches is sensible. If you have major project N that is going to be being built for 6 months then you will want to a) keep it in source source control b) not have it messing up the code you are maintaining. So a 'branch' of the code is taken and you have the benefits of source control. the trouble comes when merging the branch back in quite often the code is so divergent that you have to do a lengthy and risk prone re-integration. (funny thing is that VSS has these integration problems too). Lot's of people more clever than me have struggled with this http://accurev.com/blog/2012/03/07/avoiding-merge-hell/ , http://martinfowler.com/bliki/SemanticConflict.html . Anyway the bottom line is that merging is hard in many vcs's.

The other issue with traditonal VCS's is that they make commits something you do rarely. In a normal VCS you commit to a repositroy only when you are finished. This goes against the meme 'commit early and often' http://www.codinghorror.com/blog/2008/08/check-in-early-check-in-often.html . Wouldn't it be great to create check points when you've finished a bit of code? Or if you just wanted to hide something somewhere? Also what happens if you want to share you codebase with 2 or more servers? Well in step DVCSs.

Distributed Version Control Systems such as Git, Mercurial (also known as hg) make the repository local. So you can have your own personal source control system on your own machine. This means you have all the benfits of a VCS locally. if your server goes down, you have the code. If you make a mistake, you can rollback. It's awsome and it promotes commit early and often. Git is even better because you can branch really easily and mor important merge really easily so all this merge hell that occurs with other systems is lessened (if you use it right). You can pull code in form as many sources as possible and push up to them and every thing. Git is so horrendously powerful that you can rewrite commit history, cherry pick commits. This is also dangerous too. (with great power come great responsibiltiy). It also means there's a crap load to learn. Also for windows you have to use a linux shell and the tools are best used on the command line. This also is a shock to windows devs. Luckily github has brought out a windows client that makes the experience more palatable. In short Git is much, much better than any other VCS I have used despite its learning curve.

i've not mentioned TFS. It's basically VSS next but with a CI server and an issue tracker built in. Its fukcing awful. Read this article and comments:

I've used
VSS
SVN
Star team
TFS
HG
and Git in anger. SVN,HG and Git are the only ones I'd return too. SVN just looks like a dinosaur compared to the other 2.

Hope that helps.