Like the blog? Get the director's commentary on my podcast.

Contact

Email, iMessage or FaceTime:

fraser@speirs.org

Search
My Stuff
Navigation
Thursday
Jul192007

A Subversion User Looks at Git

Subversion was, until yesterday, the only SCM system that I understood well enough to use. Today, I feel I can add Git to that list. The disclaimer on that which follows is that it's mostly an understanding gained from reading documentation. Git appears to have an excellent documentation set but, if those documents mislead in some way, I have likely been misled too. Having said this, I'm not going to couch this in weasel words in order to appear circumspect. This is my current understanding of Git and its pros and cons. I may be wrong.

Basic Git Architecture

From an architectural perspective, Git is gloriously simple. There are four essential objects: blobs, trees, commits and tags:

A blob is strictly a piece of file content. I believe that blobs are generally segmented along file boundaries, but I haven't yet worked out if blobs are also used to track portions of a file. Blobs are named by the SHA1 hash of their contents. This can lead to a performance problem if your files are large - as a pathological case, I created a git repository of several 500MB AIFF files - it took rather a long time and ate all my RAM. That's hardly the normal case, however.

A tree assembles blobs and other trees into a hierarchical structure, matching the on-disk hierarchy of your files. A tree is essentially a mapping between a blob's name (i.e. it's SHA1) and file name. Trees are stand-alone objects in the history of a project - they don't contain any information about where they came from.

The commit object refers to a tree - specifically, the state of the tree after that commit is applied - and contains some information about who committed and what was done. It is the commit object, rather than its related tree, which connects the commit to its predecessor (or predecessors, in the case of a merge).

The tag object simply collects the SHA1 sum of an object, the object's type and a symbolic name. My understanding is that you could, in principle, tag a blob, a commit or a tree. I'm not completely certain whether one should tag commits or trees, but I suspect commits would be the correct object. It's not clear that one can reach a commit from a given tree object.

Once nice feature of Git is that it allows you to undo or change a commit after it has been made. Here's one example of where it's super useful: I work between a desktop and a laptop machine. Using subversion, when I have to move machines, I commit my work in progress and then update the machine I'm moving onto. This is generally fine, but it means there are a lot of commits in the repository that represent points at which I wouldn't normally commit code - where things are broken, incomplete or don't compile. With Git and some care, you can commit your work in progress, pull the changes to another machine and then undo the last commit.

The Directory Index

One concept that exists in Git that doesn't exist in Subversion in quite the same way is the notion of the Directory Cache. The directory cache is a file which describes a tree, although the tree which it describes may not exist in the repository yet. As you work, you add changes to this cache and when you commit, the tree described by the directory cache is written to the repository with an associated commit object. The key line from the documentation here is: "creating a new tree always involves a controlled modification of the index file" (ref: core-intro.txt).

The index file is not so very different in practice from Subversion's idea of having added files that are not yet committed. The index file is Git's representation of the same.

Having said that, git's notion of "adding" a file is sightly different from Subversion's. In SVN, you're telling SVN to "start tracking file X". In Git, you're saying "take a snapshot of the content of file X and store it in the index for the next commit". As a result, you have to - at least in principle - perform a "git add filex.c" every time you change filex.c. There is, however, some syntactic sugar in the form of "git commit -a" which adds all the modifications to known files and commits in one step.

This is pretty powerful: how often have you done some work on a feature and cleaned up some headers as you went by? When you're done, you have to look at each file you've changed and perhaps do a number of commits to specific files. In git, you can just decide not to "git add" those clean-ups to the index until after you've committed the meat of your work.

Branching and Merging

Branches, and merges between those branches, are a central concept in Git. Given that this was developed to track the Linux kernel, this is hardly surprising.

Given that NIB files are not mergeable with any common merge algorithm, it's not clear that this style of working would be terribly good for Cocoa development. The documentation does not say a whole lot about what happens to binary files. It's not that Git is unsuitable for handling NIBs - far from it. I just observe that the the approach of frequently repeated branch-and-merge operations rather depends on a high probability of clean automatic merges to be bearable. The fact that a git merge will automatically commit in the absence of conflicts suggests that this expectation underlies the design.

Having said that, it's no easier to merge a NIB in Subversion. It's just that merging isn't so commonplace an operation in SVN. The correct solution, of course, is for Apple to make NIB files more easily mergeable.

Repository Layout

One thing that I already love about Git is that it does not depend on putting a dot-directory in every directory in a working copy (recall that every Git working copy is also a repository). There's one .git directory at the root of the repository and absolutely nothing else. Anyone who has had to check RTFd files into Subversion and then edit those files with TextEdit will be cheering right about now. For those who haven't, understand that RTFd files are actually bundles, and bundles are directories. Thus, Subversion adds a .svn directory inside your RTFd file. When TextEdit saves this file, the .svn directory is lost and the file appears disconnected from its history.

For this fact alone, I'm looking at the implications of switching to Git.

What's Missing

Currently, the only thing that appears to me to be obviously missing from Subversion is the concept of svn:externals. I use externals a fair bit in my SVN projects, and I'm not yet certain how one could replicate them in Git.

You can add so-called "remote tracking branches" in Git, in which your repository tracks a branch in the repository you originally created yours from or, indeed, arbitrary branches from arbitrary repositories. This lets you switch your working copy to another branch from somewhere else, but it doesn't let you attach an arbitrary tree to an arbitrary point in your tree.

I suspect the approach might be to import source from some remote repository, create a tracking branch and then merge between the tracking branch and some subdirectory of your working copy. I have not yet seen any documentation on how to do this, nor on how to do it if the other repository is not using Git but, say, Subversion.

Conclusions

Git's rethink of the entire content management problem enables some powerful new capabilities. I write this whilst on holiday with very sporadic net access. I've been coding away in my Subversion-managed projects, but unable to commit in sensible chunks without internet access. With Git, there would be no problem whatsoever.

Because few operations depend on the network, Git's performance is excellent for most common operations and cases.

Git is confusing and alien to someone raised on CVS and Subversion, that much is certain. I feel like I understand the component parts of Git, but that I'm not necessarily entirely understanding their implications and interactions just yet. It also feels like Git gives you slightly more rope with which to hang yourself, but I do recall feeling that way about Subversion when I started using it. With SVN, I've come to trust that my usual workflow and conventions don't produce broken results and, when I'm doing something new there are good docs to back me up. I suspect I could reach the same position with Git quite easily.

Finally, I continue to ask myself whether using Git would really confer serious advantages to a (usually) solo Cocoa developer. The answer is that I'm currently not sure, for the following reasons:


  • I rarely have several branches in active development at any one time. Even if I have multiple SVN branches, I'm usually only working on one at a time.
  • Git offers nothing new to the problem of merging NIBs.
  • Git's optimistic approach to the probability of conflicts during an automatic merge is somewhat less likely for Cocoa projects than for, say, the Linux kernel.
  • I don't often collaborate with large numbers of people on projects.
  • Git's pretty confusing, even after reading the docs twice.
  • Everyone else uses Subversion (see the point about svn:externals).


Where does Git provide compelling improvements?


  • Much cleaner handling of bundle files.
  • The ability to revert a commit is something I've, er, occasionally had reason to wish existed in SVN :-)
  • Working disconnected on a laptop no longer requires either (a) a gigantic checkin once you get home or (b) picking apart your changes to commit separate features.
  • Performance is a feature.
  • The ability to explicitly define the contents of a commit in a structure other than the current state of the working copy is pretty nice.


There will certainly be more on this as time goes on. I've been hearing too much buzz about Git from people whom I respect to ignore it. I don't hear anything about arch, monotone, BitKeeper, codeville, SVK or darcs from anywhere except the nerdiest of SCM nerds.

Reader Comments (88)

Hi Fraser,

You have raised some very thought-provoking points. Git has questioned some of the assumptions of other SCM's and has tried to be less tied to those assumptions. Hence the flexibility that you found.

You comments have not addressed some questions that occurred to me. I understand that you are very new to Git and may not have ready answers. I was wondering about source code. Does Git track the changes line by line? Can if give you a diff of what changed? Is there a way of drawing a line in the sand and saying that "exactly this was the way it was, as of this date/tag/commit?" even if a blob has been revoked later? Does Git track changes to directories or moved/renamed modules?

Are you aware that there are a number of SVN GUI programs available that make it easy to commit an arbitrary list of modules separately? Another advantage is that you can compare any two versions of a source module, and the differences will be highlighted? A set of changes can be revoked without branching, as well.

I ask these questions because, not only do I want to know more about Git, I hope these questions will help you to decide which SCM best suits your needs.

Thanks,
Travis

July 21, 2007 | Unregistered CommenterTravis Risner

The same reasons you explain are those who made decide to move from Subversion to http://selenic.com/mercurial" rel="nofollow">Mercurial.

July 21, 2007 | Unregistered CommenterFrancesc Esplugas

Thanks for the summary. I've been wanting to look at git for some time (also being a heavy subversion user), and your post gives me some idea of what to expect.

Do you know if there's any way of using git from Xcode? Mostly curiosity, since even with subversion I almost always use it from the command line and not from within Xcode.

July 21, 2007 | Unregistered CommenterDiego Zamboni

Regarding Travis's questions:

The fundamental difference between git and other SCMs is that git tracks the contents of files rather than the file itself. For every commit, it takes a snapshot of all the files that you added/changed and keeps a copy of that entire file. So there is no way to 'revoke' a file afterwards. Deleting it just keeps it out of future commits. (So yes, checking out a commit is 'exactly as it was').

Git can give you a diff of what changed since the last commit or between any two commits, but it does so by comparing the snapshots it took. This, like the SHA1 hashes, can become quite slow for large files but for source code that's not usually a problem.

Git doesn't track directories as such (although there is currently a discussion among the developers and it might be added in the future), but it does keep track of their names as part of the commit, so as long as there is at least one file in a directory (or a subdirectory of it), renames are handled.

Only recently was support for something like modules added. Git calls it subprojects I believe but I haven't used it yet so I can't give you an answer there.

For some more information on this, Linus recently gave a talk at google about git that covers some of the questions you asked. You can find it at http://www.youtube.com/watch?v=4XpnKHJAok8.

July 22, 2007 | Unregistered CommenterEnno Ruijters

Great post. I have not looked at Git much, but thought I'd point out a couple of things about Subversion from your post.

It is not particularly difficult to revert a commit in Subversion. See http://blogs.open.collab.net/svn/2007/07/second-chances-.html as an example.

I did not understand the point you were making about adding files after you commit and the advantage that Git brings. Subversion can certainly do the same thing. As someone else pointed out, Subversion also has a number of great GUI clients that make this easier.

One thing that I think I read into your article that you did not specifically say, is that it sounds like with Git when you add a file it sort of freezes the content it added. So if you were to make future changes to the file and then commit, the original content would get committed, not the latest changes. This is different than Subversion and could potentially be useful.

Another thing to come back to was your comment about reversing commits. I thought one of the points of a distributed version control system like Git was that you could and would do lots of these checkpoint commits. Therefore you get more version control features during the development process. I guess what I am saying, is that I do not understand why you would want to reverse these commits. It seems like the point of Git is to empower you to make these commits without impacting others.

Anyway, thanks for the post.

Mark

July 22, 2007 | Unregistered CommenterMark Phippard

Mark,

The link you referred to about reverting commits in Subversion says:

"The only way to truly remove something from a repository is to dump the repository to a file, carefully remove the parts you do not want from the dump file, and then reload the repository."

Certainly, you can make a further commit in Subversion that undoes the effect of your previous commit, but you can't easily say "throw away the last commit I made" as you can in Git.

The main attraction of removeable commits is when I want to hop from my desktop to my laptop. I can commit whatever I was doing on the desktop - however broken it was - then move to the laptop, pull the changes, merge and remove the last commit.

July 22, 2007 | Unregistered Commenterfraserspeirs

Similary to you, I decided to read up on distributed SCM while on holiday.

I looked at Git a bit, but I chose to look at Mercurial in more depth. Git and Mercurial are very similar, but these are the more attractive features:
- Mercurial is simpler for the CVS/SVN user
- Far fewer commands to learn
- No need to pack your resposity
- Works on Linux, Mac OS X, and Windows
- Has Mercurial Queues (MQ) extension.

Be sure to read up on MQ: http://hgbook.red-bean.com/hgbookch12.html

The model I've come up with, for contributing to projects that use SVN publicly is to use hgsvn to create an mercurial mirror of an SVN repo. Now I can work offline, or even just have faster access to all history.
Then use MQ to develop a patch set (say one patch for the loggine system, one patch for the documentation, and another for the data access layer). Then I can work on the patches while keeping up to date (by using hgsvn) and then finally manually apply the patches to SVN.

Works a treat! No need to get *insert favourite project here* to convert to a distributed SCM.

David

July 23, 2007 | Unregistered CommenterDavid Roussel

I definitely prefer Git over SVN for the current open source project I am working on. Git just makes collaboration on a large project that much easier. I still think SVN is an excellent tool though. I'm not sure that either Git or SVN is better than the other, only that one is more appropriate for a given project depending on your workflow and project requirements.

October 2, 2009 | Unregistered CommenterDeceth

>git doesn't let you attach an arbitrary tree to an arbitrary point in your tree
Use 'git submodule'

November 4, 2009 | Unregistered CommenterKarthik

http://www.ukuggus.com/
http://www.asics-us.com/asics-gel-kinsei-2-c-1.html
http://www.mbt-usa.com/mbt-sport-c-3.html
http://www.coachusabags.com/

October 28, 2010 | Unregistered Commenterugg australia
October 28, 2010 | Unregistered Commenterugg australia

href="http://www.ukuggus.com/">ugg sale

October 28, 2010 | Unregistered Commenterugg australia
October 28, 2010 | Unregistered Commenterugg australia

[url=http://www.mbt-usa.com/mbt-chapa-c-4.html]Mbt Shoe[/url]

October 28, 2010 | Unregistered Commentermbt shoe

In all is the treasure!

October 29, 2010 | Unregistered Commentercheap uggs

I crave to learn more.. Reading forums like this makes me aware of

many existing phenomenon which i had been not in touch with..ghd

hair dryers .I am glad i read this post. I feel learnt.. !! Thanks

for the post !! Smileghd precious

November 1, 2010 | Unregistered Commenterghd hair straightener

Hi webmaster, commenters and everybody else !!! The blog was absolutely fantastic! Lots of great information and inspiration, both of which we all need!b Keep 'em coming... you all do such a great job at such Concepts... can't tell you how much I, for one appreciate all you do!

November 5, 2010 | Unregistered CommenterAir Max 90

This is a very good idea! Just want to say thank you for the information, you have to share. Just continue to write such a position. I will be your faithful reader. Thank you again.

November 5, 2010 | Unregistered CommenterED Hardy Clothing

I hope you have a nice day! Very good article, well written and very thought out. I am looking forward to reading more of your posts in the future.

November 5, 2010 | Unregistered CommenterLuxury Phone

good post...oh coo, this information is really useful and definately is comment worthy! hehe. I’ll see if I can try to use some of this information for my own blog. Thanks!

November 5, 2010 | Unregistered CommenterPrada Handbags

UGG footwear have become quite well known on this planet. Our store mainly gives Cheap UGG Boots, ugg outletYou are welcome to our Cheap UGG Boots the particular Ugg brand built up,

November 10, 2010 | Unregistered Commenterugg outlet

A LEAMINGTON-based Water Dispenser company has clinched hot water dispenser worth more than pounds 500,000. Plastic cold water dispenser Engineering, of Juno ice maker Drive, Spa home water dispenser Business fended off Bottled water dispenser stiff competition cocktail dresses to land the deal cheap bridesmaid dresses with American wholesale evening dresses car components Carbonless Paper manufacturer TRW Lucas pos paper Varity Electric Thermal Paper roll Steering (TLES). It will Fax Paper supply the Computer Paper American firm with elegant wedding gown thermoplastic injection moulded parts for a wedding dress new electrical power-assisted steering (EPAS) system.

November 11, 2010 | Unregistered Commenterszoasis

The ugg bootsis Featuring high grade merino sheepskin leather.To ensure your sheepskin boots keep your toes extremely warm without sweating UGGS utilize only premium grade Australian merino sheepskin prized for its natural thermostatic benefits. The inner fleece circulates warm air around your feet while also removing moisture and odor. This bed of warm air will keep your feet comfortable in temperatures as low as -30 degrees. our discount ugg bootsstore is the largest purveyor of grade all over the world. We pay more attention to guarantee that cheap ugg boots are going to be great every moment. You will feel delighted when you own these Cheap Uggs

November 22, 2010 | Unregistered Commenterugg boots

cheap moncler jackets discount
women moncler me
moncler coats moncler coats nice

good christian louboutin on sale
prefect christian louboutin sale uk gift
tall discount louboutins classic
good cheap louboutin store on sale
prefect louboutin uk gift
tall cheap louboutin store classic

shoes discount ugg boots is very Beautiful
boots ugg boots is very good
sale ugg boots on sale is very Comfortable
classic Ugg is very good
cheap Cheap Ugg is very Comfortable
shoes Cheap Uggs is very Beautiful
beautiful cheap ugg boots cardy
classic ugg boots sale is very good
classic classic ugg boot is very good
very good classic ugg boots is very cheep
boots ugg classic short is Good quality
classic ugg classic short boots is very Comfortable
very classic ugg classic cardy is very good
very cardy classic cardy ugg boots is very Beautiful
classic cardy ugg classic cardy boots is very Comfortable

boots UGG Boots UK is Good quality
classic ugg boots sale is very Comfortable
very classic cheap ugg boots is very good
very cardy ugg boots is very Beautiful

shoes ugg is very Beautiful
boots cheap ugg is very good
sale cheap ugg boots is very Comfortable

good ugg boots sale cheap
good cheap ugg boots on sale
prefect discount ugg boots gift
tall ugg boots classic
women ugg boots on sale men
discount Ugg on sale
for sale Cheap Ugg good quality
cute Cheap Uggs nice

boots UGG Boots UK is Good quality
classic ugg boots sale is very Comfortable
very classic cheap ugg boots is very good
very cardy ugg boots is very Beautiful

classic uggs outlet is very Comfortable
very classic ugg outlet is very good
very cardy black ugg boots is very Beautiful

discount uggs outlet on sale
for sale ugg outlet good quality
cute ugg outlet store nice
good ugg bailey button online shop
nice ugg bailey buttons comfortable
beautiful bailey button ugg boots cardy
cheap bailey button uggs discount
shoes ugg bailey button is very Beautiful
boots ugg bailey boots is very good
sale uggs bailey button boots is very Comfortable
classic ugg bailey button sale is very good
cheap ugg bailey button boots is very Comfortable
good ugg classic boot cheap
good ugg classic boots on sale
tall ugg classic tall classic
beautiful ugg classic tall boots cardy
cheap classic tall ugg boots discount
comfortable ugg classic tall chestnut cheap
for sale cheap ugg boots good quality
cute ugg boots nice
discount ugg boots sale on sale
for sale UGG Boots UK good quality

December 1, 2010 | Unregistered Commenterugg boots

Oh, that’s really unique thoughts referring to this post! Even the writing work online service can not really simply deal with that.

December 30, 2010 | Unregistered Commenterlouis vuitton outlet
Editor Permission Required
You must have editing permission for this entry in order to post comments.