w w w . a q u a m e n t u s . c o m
  What is Git?
  Running it for the very first time
  Creating a local git area
  Checking out an existing git area
  How to work in a git area
  Undoing
  Branching
  Working with "source" repos
  Tagging
  3-Stage cheat-sheet
    diffing
    pushing/submitting
    pulling/updating

What is Git?

Git is a version control system similar to CVS, subversion, or perforce. It was architected and developed by none other than computing legend Linus Torvalds for managing the linux kernel.

So what sets it apart from the others? Most VCSs are centralized -- that is, there's a repository (a nice black box somewhere) storing every version of every file you'd want, and consumers check files out from it and check them back in when they're done. Git, however, turns the idea of centralization on its head. Whenever you "check out" a repository, you're actually making a completely equivalent, standalone, local copy. (Which is why they call it "cloning" instead of "checking out".) The reason for this is because it was designed for linux: a single code base with thousands of people concurrently editing who may not have access to the network at any one given time.

A second important thing you need to know about git is that it has this idea of a "staging area". A local change first goes into the staging area before it can finally go into the repository. Note that the staging area is not just a list of files, but the actual file contents as well -- if you stage a file and then continue editing it, the additional edits won't get swept to the repository until you stage them as well!

(git also calls the staging area "the index". An unbreakable law of creating your own VCS is that existing concepts have to change existing terminology by one word, and new concepts have to create two new ones.)

So if you really want to take an abstraction step backward, there are really four possible places between which you shuffle data:

local <--> index <--> local repository <--> remote repository

whereas almost every other system I've used (CVS, SVN, perforce) has this:

local <--> remote repository

and one other (RCS) doesn't even do that:

local == repository

Veteran VCS users will be happy to know that git carries on several long-standing VCS traditions:

Running it for the very first time

Since git needs to report who's doing checkins and checkouts, the very first thing you have to do is tell it who you are. (And you only need to do this once, regardless of how many repositories you create or clone.)

If your name is Garrosh Hellscream and your email address is crappy_warchief@orgrimmar.gov, run these:

% git config --global user.name "Garrosh Hellscream"
% git config --global user.email "crappy_warchief@orgrimmar.gov"

Creating a local git area

Creating a git area is nondestructive and non-over-inclusive, so you can create a new git area in either:

Assuming your highly-creative project name is Foo, run this:

% mkdir Foo
% cd Foo
% git init   # parent dir is 'Foo', so that's what repo you just made

This creates a hidden .git dir for you that you can usually ignore.

Assuming your Foo project has input files Foo.c, Foo.h, and Bar.c, run this:

% git add Foo.c Foo.h Bar.c

This does not add files to the repository, it adds them to the staging area. "git stage" would have been a less misleading name, but hey.

Now that the files are staged, run this:

% git commit

Note that you did not specify any files whatsoever! The list of files that are commited are the ones currently in the staging area.

Checking out an existing git area

Most of the time you're not just creating a repository in isolation. Usually there's already a repo set up, and you want to work on that. The first step is to essentially copy the repo to your local area. Assuming the project is called Foo, you'd commonly do something like this:

% cd /home/ghellscream
% git clone /path/to/existing/repo/Foo Foo

This creates a /home/ghellscream/Foo dir containing everything from /path/to/existing/repo/Foo.

How to work in a git area

Unlike perforce, you don't have to declare that you're going to edit something; in git, you just change whatever you want. Let's suppose you make a brilliant change to your Foo.c file.

To see what's different between it and what's staged, run this:

% git diff Foo.c

Then you can stage your brilliant change:

% git add Foo.c

Now that it's staged, you can look at what's different between the staged version and what's in the repository with the following:

% git diff --staged Foo.c

If you want to see which files you've edited without sorting through diff output, you can run this:

% git status

(SHORTCUT: commit has a -a switch which automatically stages all edited files. It won't add new files though.)

You can see the history of a repository (or a specific file) with:

% git log
...
% git log Foo.c
...

Note that git log does not tell you the actual diffs. Like perforce, git thinks it's funny to require two steps to see the history of diffs. The second step is to run git show $COMMIT, where $COMMIT is the checksum string of whatever one (1!) commit you want to see.

You can delete files with:

% git rm Foo.c
...

Undoing

If you do a git add on a file (Foo.c) and decide you don't want it included in the next commit, you can remove it from the index with this:

% git reset Foo.c

If you made edits to a file and you don't want those edits any more, you can revert it like so:

% git checkout -- Foo.c

Branching

The default branch in git is called "master". (As opposed to "trunk".)

You can start a new branch named Asdf like so:

% git branch Asdf

However, you're still on the master branch, so to switch over you run:

% git checkout Asdf

You can see you're now on the Asdf branch:

% git branch
  master
* Asdf
%

You can edit files at will in any branch you like; the procedure of editing-staging-committing is the same. But now suppose you want to merge some of the changes from one branch to another. To do that:

  1. git checkout the destination branch for the merge
  2. Run git merge $BRANCH, where $BRANCH is the source branch for the merge
Hopefully you have no conflicts, so then you just add, commit, and you're done. If you do have conflicts, you can view them with git diff, and then go fix them. (git inserts conflict markers, just like everyone else.)

If you're done with branch Asdf, you can delete it, like so:

% git branch -d Asdf

(Note: '-d' requires that Asdf has been merged back into the current branch before it will delete it. If you want to discard the changes, use '-D')

Working with "source" repos

Every repo is a standalone copy, so the commit you do is really only to your local repo. To push your changes into another repo, you do this:

% cd /path/to/existing/repo/Foo
% git pull /home/ghellscream/Foo master

This essentially merges Garrosh's 'master' branch into the repo's current branch. (So, in a sense, you can see that cloning a repo is pretty much just branching it.)

A great followup question is how you can sync to get changes from the central area. You do almost the exact same thing as before, except that the 'clone' command stores where the remote repo is so you can use the special string 'origin':

% cd /home/ghellscream/Foo
% git pull origin

(Side tip: there is a 'push' command as well. I'm not sure how it deals with conflicts, but it's good to know you don't have to cd all the time when you use the central-repository model.)

You might wonder how you can see the changes before you do either of these pulls. (You usually want to run 'diff' before 'add', yes?) The equivalent here is to run this:

% cd /path/to/existing/repo/Foo
% git fetch /home/ghellscream/Foo master
% git log -p HEAD..FETCH_HEAD

Perfectly straightforward. /eyeroll. What's going on here is that the 'fetch' command essentially copies the requested repo/branch to the local area as a "fetched" copy. The special syntax on 'log' says to compare the current HEAD to that fetched copy.

Since remote repos are common in git, git lets you alias them so that you don't have to type a full path over and over. Let's create an alias named Moo for the existing repo:

% git remote add Moo /path/to/existing/repo/Foo

Tagging

You can add tags to the git repo fairly easily. Find the commit name for what you want to tag (let's assume it's a1b2c3d4e5f6, since it's some hexadecimal gobbletygook), decide on a tag name (it has to be "foo"), and run this:

% git tag foo a1b2c3d4e5f6

All this really does is create an alias to a commit name. You can have multiple tags pointing to a single commit, but you cannot have a single tag point to multiple commits.

The restrictions on tag names are all related to the fact that they're used in multiple contexts:

You're allowed to use slashes though, and in fact git really wants you to have at least one (since they denote hierarchy, and having at least one is equivalent to having a namespace).

The cool thing with tags is you can use them in lots of git commands:

% git diff foo HEAD   # everything that's different between the 'foo' commit and now
% git branch foo-debug foo   # create a branch of 'foo' called 'foo-debug'
% git grep MOO foo  # look for /MOO/ in the 'foo' commit

3-Stage cheat-sheet

For reference, the three stooges (ha!) are:

  1. local-to-index
  2. index-to-localrepo
  3. localrepo-to-remoterepo

diffing

  1. git diff
  2. git diff --staged
  3. git fetch origin master; git diff FETCH_HEAD..HEAD # (maybe)

pushing/submitting

  1. git add
  2. git commit
  3. git push

pulling/updating

  1. N/A?
  2. N/A?
  3. git pull


Chris verBurg
2015-03-08