Monday
11May2009
Understanding Git Submodules
Monday, May 11, 2009 at 12:39PM
I've been using Git full-time for over a year now, but I had not yet adopted Git's submodule feature for my projects. Git submodules are functionally similar to Subversion's svn:externals mechanism, but submodules do appear slightly alien and confusing at first (and second) glance.
So I went deep and here, for the internet, is my best explanation: Git Submodules are basically the same as svn:externals, except that Git submodules are locked to a specific revision and don't automatically track the external project's HEAD.
Git submodules behave more like svn:externals that are managed by Piston than by Subversion's default externals.
In my experience, to understand Git, you have to understand its implementation. Git is very driven by its model layer and, once you understand the model layer, I find that everything else follows quite logically.
As you may know, Git stores commits as blobs of data and trees which describe the layout of that data in the filesystem. The commit ID is the SHA-1 hash of the blob's contents. I simplify slightly, but that's the core. Keep this in mind.
Git submodules are implemented using two moving parts: the .gitmodules file and a special kind of tree object. These together triangulate a specific revision of a specific repository which is checked out into a specific location in your project.
The submodules file contains two parts:
The submodule's definition contains a path, which is the location in your repository where the submodule should be placed. The `url` is the URL of the repository to clone from. This example is a GitHub URL but it could equally be a path to a repository on your system. Thus far, Git knows where to get your submodule and where to put it.
The second question is which commit should be checked out into the submodule's path. You tell Git this by adding the submodule path to your index and committing.
Let's try an example. This is a repository which contains a Git repository called "a" and another called "super". We will add "a" as a submodule of "super":
The first thing to do is run "git submodule add" in super:
Having done that, let's look at the impact of that command on the project "super":
We have the new .gitmodules file, which should be checked in, and a new file called "ProjectA", which is the "path" of our submodule. Let's commit these two now:
Note the mode "160000" on ProjectA - that's a special mode for a certain kind of entry in the Git index. It's different from normal files.
Now, if we look at the contents of the Git index, we'll see the SHA-1 for the tracked files:
Notice the SHA for ProjectA - 85ab8ba - this is the SHA-1 of the commit to which the submodule is locked in Project A. Commit 85ab8ba does not exist in the "super" repository - it refers to a commit in a submodule repository.
So Git now knows the three things required to set up your submodules when cloning a project:
Working in a Submodule
The checked out submodule is, of course, a full Git repository in itself and you should treat it that way. It is perfectly possible to make changes in your checked-out submodule. As you commit in your submodule, the SHA-1 of the submodule's HEAD will advance away from the SHA-1 that the superproject has stored in its index.
To return to the example, suppose some change is made in ProjectA:
Notice that, now, the SHA-1 of the submodule's head is at 82b6450, whilst the superproject is expecting 85ab8ba4. There are two ways Git shows you that you're out of sync:
If you want to commit the superproject to using the new HEAD of the submodule, simply add and commit the submodule's directory as you would any other file:
Notice how 'git ls-files --stage' and 'git submodule status' now show the same SHA-1 for ProjectA?
Gotcha's for those used to svn:externals
The big thing to remember is that, unlike svn:externals, updating your superproject from a master repository does not do the same for the project's submodules. If you think about it, this makes sense: the submodules are locked to specific commits in their respective repositories.
It's also important to remember the distributed nature of what you're doing. If you advance HEAD in a submodule, then update the superproject, it's important to remember to push submodule changes before you push the superproject changes. If you don't, your superproject will contain references to commits that only exist in your local clone of the subproject.
Wrapping Up
This post does not attempt to cover every command for working with Git submodules. In particular, you should be aware of the 'git submodule init' and 'git submodule update' subcommands - read the man page for that.
Git submodules really aren't that complex or scary. They have comparatively few moving parts and, to my mind, enforce a certain welcome stability and discipline in your use of external projects.
So I went deep and here, for the internet, is my best explanation: Git Submodules are basically the same as svn:externals, except that Git submodules are locked to a specific revision and don't automatically track the external project's HEAD.
Git submodules behave more like svn:externals that are managed by Piston than by Subversion's default externals.
In my experience, to understand Git, you have to understand its implementation. Git is very driven by its model layer and, once you understand the model layer, I find that everything else follows quite logically.
As you may know, Git stores commits as blobs of data and trees which describe the layout of that data in the filesystem. The commit ID is the SHA-1 hash of the blob's contents. I simplify slightly, but that's the core. Keep this in mind.
Git submodules are implemented using two moving parts: the .gitmodules file and a special kind of tree object. These together triangulate a specific revision of a specific repository which is checked out into a specific location in your project.
The submodules file contains two parts:
[submodule "FooKit"]
path = FooKit
url = git@github.com:fspeirs/fookit.git
The submodule's definition contains a path, which is the location in your repository where the submodule should be placed. The `url` is the URL of the repository to clone from. This example is a GitHub URL but it could equally be a path to a repository on your system. Thus far, Git knows where to get your submodule and where to put it.
The second question is which commit should be checked out into the submodule's path. You tell Git this by adding the submodule path to your index and committing.
Let's try an example. This is a repository which contains a Git repository called "a" and another called "super". We will add "a" as a submodule of "super":
[/tmp/git]$ ls -l
total 0
drwxr-xr-x 4 fspeirs wheel 136 May 11 11:03 a
drwxr-xr-x 4 fspeirs wheel 136 May 11 11:03 super
The first thing to do is run "git submodule add" in super:
[/tmp/git/super(master)]$ git submodule add /tmp/git/a ProjectA
Initialized empty Git repository in /private/tmp/git/super/ProjectA/.git/
[/tmp/git/super(master)]$ git submodule status
-85ab8ba4edf9168ab051ded7ddbbe20861b71528 ProjectA
[/tmp/git/super(master)]$ ls ProjectA/
a.txt
Having done that, let's look at the impact of that command on the project "super":
[/tmp/git/super(master)]$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD..." to unstage)
#
# new file: .gitmodules
# new file: ProjectA
#
We have the new .gitmodules file, which should be checked in, and a new file called "ProjectA", which is the "path" of our submodule. Let's commit these two now:
[/tmp/git/super(master)]$ git commit -m "added submodule"
[master]: created ffba648: "added submodule"
2 files changed, 4 insertions(+), 0 deletions(-)
create mode 100644 .gitmodules
create mode 160000 ProjectA
Note the mode "160000" on ProjectA - that's a special mode for a certain kind of entry in the Git index. It's different from normal files.
Now, if we look at the contents of the Git index, we'll see the SHA-1 for the tracked files:
[/tmp/git/super(master)]$ git ls-files --stage
100644 831cdc0dc1b88e69aa9943cf09907ae1bcd031fc 0 .gitmodules
160000 85ab8ba4edf9168ab051ded7ddbbe20861b71528 0 ProjectA
100644 16f5c2d3aa9656fc424352e4cfaa2523c809778b 0 super.txt
Notice the SHA for ProjectA - 85ab8ba - this is the SHA-1 of the commit to which the submodule is locked in Project A. Commit 85ab8ba does not exist in the "super" repository - it refers to a commit in a submodule repository.
So Git now knows the three things required to set up your submodules when cloning a project:
- The what comes from the "URL" property in the submodule's entry in your .gitmodules file.
- The where comes from the corresponding "Path" entry in .gitmodules.
- The when, if you will, comes from the SHA-1 stored in the superproject's index file for the remote.
Working in a Submodule
The checked out submodule is, of course, a full Git repository in itself and you should treat it that way. It is perfectly possible to make changes in your checked-out submodule. As you commit in your submodule, the SHA-1 of the submodule's HEAD will advance away from the SHA-1 that the superproject has stored in its index.
To return to the example, suppose some change is made in ProjectA:
[/tmp/git/super(master)]$ cd ProjectA/
[/tmp/git/super/ProjectA(master)]$ echo "b" >> a.txt
[/tmp/git/super/ProjectA(master)]$ git commit -a -m "Added B"
[master]: created 82b6450: "Added B"
1 files changed, 1 insertions(+), 0 deletions(-)
[/tmp/git/super/ProjectA(master)]$ cd ..
Submodule 'ProjectA' (/tmp/git/a) registered for path 'ProjectA'
[/tmp/git/super(master)]$ git submodule status
+82b64501654dca53ba570827d8d3e7d465abbae5 ProjectA (heads/master)
[/tmp/git/super(master)]$ git ls-files --stage | grep ProjectA
160000 85ab8ba4edf9168ab051ded7ddbbe20861b71528 0 ProjectA
Notice that, now, the SHA-1 of the submodule's head is at 82b6450, whilst the superproject is expecting 85ab8ba4. There are two ways Git shows you that you're out of sync:
- "git submodule status" will show a "+" in front of the SHA-1 of the HEAD of any submodule that has advanced from the SHA-1 stored in the superproject.
- Running "git status" in the superproject will show the submodule as modified.
If you want to commit the superproject to using the new HEAD of the submodule, simply add and commit the submodule's directory as you would any other file:
[/tmp/git/super(master)]$ git submodule status
+82b64501654dca53ba570827d8d3e7d465abbae5 ProjectA (heads/master)
[/tmp/git/super(master)]$ git add ProjectA
[/tmp/git/super(master)]$ git commit -m "Advanced ProjectA to new HEAD"
[master]: created 37750a6: "Advanced ProjectA to new HEAD"
1 files changed, 1 insertions(+), 1 deletions(-)
[/tmp/git/super(master)]$ git ls-files --stage
100644 831cdc0dc1b88e69aa9943cf09907ae1bcd031fc 0 .gitmodules
160000 82b64501654dca53ba570827d8d3e7d465abbae5 0 ProjectA
100644 16f5c2d3aa9656fc424352e4cfaa2523c809778b 0 super.txt
Notice how 'git ls-files --stage' and 'git submodule status' now show the same SHA-1 for ProjectA?
Gotcha's for those used to svn:externals
The big thing to remember is that, unlike svn:externals, updating your superproject from a master repository does not do the same for the project's submodules. If you think about it, this makes sense: the submodules are locked to specific commits in their respective repositories.
It's also important to remember the distributed nature of what you're doing. If you advance HEAD in a submodule, then update the superproject, it's important to remember to push submodule changes before you push the superproject changes. If you don't, your superproject will contain references to commits that only exist in your local clone of the subproject.
Wrapping Up
This post does not attempt to cover every command for working with Git submodules. In particular, you should be aware of the 'git submodule init' and 'git submodule update' subcommands - read the man page for that.
Git submodules really aren't that complex or scary. They have comparatively few moving parts and, to my mind, enforce a certain welcome stability and discipline in your use of external projects.
Reader Comments (6)
Thanks Fraser. This is by far the best introduction to git submodules I've encountered.
It strikes a balance that most other git documentation[*] fails to achieve: most is either way to technical and implementation focussed (i.e. the 'official' docs and man pages) or too 'type this and you get that' kind of instructions that make me feel like a trained monkey :)
[*] except gitready.com, of course. they're awesome, too :)
A side note: if you're using svn:externals to track HEAD you're playing with fire, since somebody could come along and change the external repo and break your build. i.e.: if you tag a version of your repo and it contains a svn:externals reference that tracks HEAD then that tag is basically worthless, since you can't predict what you'll get from the external reference.
Best practice with svn is to always refer to a tagged version of the external repo with svn:externals, since then you have some determinism in your own repo.
The only really annoying part of git submodules is if I'm not aware that a given project uses submodules, I can get quite confused until I figure that out and run `git submodule init`. I wish the initial checkout would just tell me that the project contains submodules and that I should run `git submodule init`, or possibly just go ahead and do that automatically for me.
The most non-obvious part of whole git system are submodules. Problems come when you need to revert your git repository with submodules to some point in the past. You need to revert submodules manually and you need to keep in mind the exact SHA-1 of the required commit. Correct me if I'm wrong.
Roman, I think you are incorrect. I think after you revert your superproject to a given commit, run 'git submodule update' and it should revert your submodules to the correct commit, no?
Different persons sould treat paper writers really kindly, simply because they perform the perfect write my essay information associated with this topic. Therefore, that’s not a problem to purchase term paper because of it at present.