Translation(s): English - Italiano



Using Git

Debian Women IRC Training Session held by David Paleino, 25-Nov-2010

This is an introductory tutorial about how to use git. It explains the basic things to understand how it works and to make a basic usage of it.

Requirements

In this tutorial it is assumed that:

Technical requirements:

Introduction

What is git? Git is a Distributed Version Control System. The important part here is Version Control System -- it means that it's a software that lets you track changes in files and compare different versions, and do other nice things, like going back to a previous versions of a certain file.

Git is used by many modern software projects, so it's good to know how it works a bit. I won't go into much detail, I'll just explain the basic things to understand how it works and to make a basic usage of it.

Theory

''Distributed'' version control system

We analyzed git as a Version Control System, but git is a Distributed VCS. Distributed is an architectural detail of git, which has some pros and some cons. There are other "famous" VCS'es, like CVS and SVN, these are called Centralized VCS. The difference is that with Centralized VCS, you need to have a connection to the central server, where all the data is kept, to do many operations. Think of the log operation: SVN needs to connect to the server to retrieve it.

With Distributed VCS (and git is only one of them), this doesn't happen: every copy of the repository is a full copy. This means that operations are generally faster and, moreover, that you can just use git on your local computer, without having a server.

Obviously, a Distributed VCS also has its cons. The most important problem I see with it is the higher number of "conflicts" happening. This is because with Centralized VCS's, a commit is usually refused if it conflicts with server's copy. With Distributed VCS's instead, anyone can commit anything in her local repo, and conflicts only show up at "push" time (we'll see what "push" means later).

The git storage model

Every object inside a git repository is identified by a unique string. This is called the hash. It usually is a SHA1sum of some properties which we'll talk about later.

A git object can be one of blob, tree, commit or tag. Let's see these one at a time:

If we think about git's storage model, we distinguish a working area, an index and a repository:

Practice

Creating a repository

For the practice part, I wanted to use a real-world project, instead of making up some repository by myself. I chose to use GNU Hello, please download the tarball from http://ftp.gnu.org/gnu/hello/hello-2.6.tar.gz. We will create a repository from this source code.

So, let's get the tarball:

$ wget http://ftp.gnu.org/gnu/hello/hello-2.6.tar.gz

Once it's finished, unpack it:

$ tar zxvf hello-2.6.tar.gz

This will create a hello-2.6/ directory. Now, enter this directory, and we'll start playing with git :)

First of all, we need to configure our username and our e-mail. These info will be used in our commits, and will be visible in the repository history. To do so, we use git config. In particular, since we're complete beginners, we want to set a global username/email. To do so, let's do:

$ git config --global user.name "Debian Woman Attendant"
$ git config --global user.email "attendant@debian.org"

Obviously use your data :))

The --global switch will make these changes global, i.e. for any git repository on your computer it will write data to ~/.gitconfig. Check that file, after you've given those two commands. You'll see the data you entered.

The username and email can be also set on a per-repository basis: in this case, you'll need to do it after creating the repository, and without the --global switch. Without --global, it will write data into ./.git/config, i.e. locally.

We've set our username and our e-mail. Now, we need to create the git repository. So, we have a hello-2.6/ directory: enter it, and issue:

$ git init

You'll see something like:

Initialized empty Git repository in /tmp/dw/hello-2.6/.git/

git init simply creates a .git/ directory, with some default values in it. To see the status of a repository, launch:

$ git status

It will show you tracked/untracked files, and the status of the index. Also, it will show you on what "branch" you are (I'll cover branches later).

Indexing

We still don't have anything in the repository, though. So, let's add the source code:

$ git add .

The "." is common Unix-syntax -- it means "current directory". So, we're effectively adding everything. Now, check git status again. You'll see that something changed. What you see now is the status of the index.

We could still remove things from the index, without leaving traces in the repository history. Let's do it! Let's remove AUTHORS from the index, and put it back to the untracked status:

$ git rm --cached AUTHORS

Now, check git status again. You'll see Changes to be committed (the index) and Untracked files.

Committing

We want to commit the files in the index: this will create a commit, with a hash, and will be kept in the repository history. Let's do it:

$ git commit

This command will open your $EDITOR (mine is nano, check yours), where you should write a commit message. If you don't write one, the commit will abort (yes, you need a commit message).

Let's say it's Initial commit. Now, git status again. The files from index are gone! They've been committed to the repository, and a log has been kept.

You can see the log with:

$ git log

It will show you the committer (with the data you set before), the timestamp, and the commit hash.

You can also see the last commit with:

$ git show

It will automatically open $PAGER (more, less, ...), and show you the contents of your last commit. git show also accepts a hash as argument. My commit hash is: 11aab8486d20490b16b1b7d847e1cb1e4f7aa2fe . This will be different for each of you. It isn't necessary to write the full hash -- usually the first 7-8 characters are enough. So, we can also use:

$ git show 11aab848

git also supports a number of symbolic names, but I won't go into this, since I believe it's more than basic (I'm talking about HEAD, HEAD^, HEAD~2 and so on)

So, we left AUTHORS out of our repository... poor people! No credit for their work! Let's fix this:

$ git add AUTHORS
$ git commit -m 'Also add AUTHORS'

-m is a shortcut for message -- it avoids opening up $EDITOR

Now, we'll edit some files, see the differences, and commit them. First, let's pretend we wrote bits of the current source.

Let's add our name to AUTHORS :) Let's also add something to ChangeLog. Whatever you want, it's just an example.

Now, git status. You'll see two lines starting with modified. You can see the differences you introduced:

$ git diff

(Optionally, git diff filename will show you only the differences in that file.)

If you're happy with the diff, let's commit it. You can either git add them one by one, and then git commit or just use:

$ git commit -a -m "Some message"

The -a switch will add everything to the index (from the tracked files, it won't touch untracked ones). You'll see:

[master 3295347] Some message
2 files changed, 5 insertions(+), 0 deletions(-)

master is the branch we're currently in. The string after it is the commit hash -- you can use it in most commands (git show <commit>, git log <commit>, and so on). Then, it comes the log message and the diffstat output.

Branching

What is a branch?

Think of your git repository as a river. At a certain point, development can diverge from the main flow and it can stay on its own, or merge back to the main river. Now, our master is the main river.

Let's make a branch, let's call it debian:

$ git branch debian

To change to this new branch, use git checkout:

$ git checkout debian

A shortcut for the above two commands is:

$ git checkout -b debian

Ok, so, we checked out the debian branch. To confirm it, execute git branch, without arguments. It will show you the current local branches, and a * will be prepended to the branch you're currently in.

To go back to the master branch, just: git checkout master.

For the moment, we'll stay in the debian branch: *debian. Inside this branch, let's pretend we are going to do the packaging work.

So, let's create a debian/ directory. If the directory is empty, git status won't show it, this is expected behaviour: git doesn't track empty directories. To trick it into doing so, you can add an empty file to that directory. I usually add a .gitignore (it's a special file used by git), to let it track empty directories. So, let's do all this:

$ mkdir debian
$ touch debian/.gitignore

Now git status will show an untracked debian/. Add it and commit it.

Let's go back to the master branch:

$ git checkout master

Now, we want to make these two branches diverge, to simulate a real-world branching: change any file you want, anything, and commit it.

You can use a GUI (gitg, gitk), or something from the console (git show-branch) to see how the branches diverge. Since the GUIs are easy, we'll use the console one :o)

$ git show-branch

You'll see that the two branches have the initial commit in common, but then they have different commits. Let's merge the changes in debian into master:

$ git merge debian

You'll see something like:

Merge made by recursive.
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 debian/.gitignore

It is now that conflicts will happen, if any. If in the debian branch we changed one of the files we changed just before the merge, there could've been a conflict.

It could be useful to run git mergetool after a merge, to solve conflicts. It will use one of several possible programs to handle the conflict. I won't go into detail here -- for basic usage, I'd say that manual resolution of the merge is enough.

We merged the debian branch into master. Let's see the log, you should see something like:

commit cdfd20167aa05f74f4785ef7aa03355d51add5b3
Merge: 7e9ff3a 2ba81df
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:45:28 2010 +0100

    Merge branch 'debian'

commit 7e9ff3a18dc114b4ce1e1a96f1dd3ecd696f064d
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:43:49 2010 +0100

    New author

commit 2ba81dfaeb919e6a0c634be54fe363b11487d65a
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:42:42 2010 +0100

    add debian/

commit 3295347b1dbd7b5925dca7fcc6858af51a710ada
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:32:55 2010 +0100

    Some message

commit f6aa148a5ce1331de6d17e770a8efbb98ad32344
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:11:57 2010 +0100

    Also add AUTHORS

commit 11aab8486d20490b16b1b7d847e1cb1e4f7aa2fe
Author: David Paleino <dapal@debian.org>
Date:   Thu Nov 25 23:03:42 2010 +0100

    Initial commit

Reverting

The last bit I wanted to show for the local workflow is git revert.

Let's suppose I see something I don't like in my git log. For the sake of example, let's say it's (my) commit 2ba81dfaeb919e6a0c634be54fe363b11487d65a , i.e. the one where we added the debian/ directory. Please remember that your commit hash will be different, so check your log to get the correct hash.

So, I don't like it. What should I do now? I can use the git revert command.

This command basically takes the diff from a commit, applies it in reverse, and leaves conflicts, if any so, let's do it:

$ git revert 2ba81dfaeb919e6a0c634be54fe363b11487d65a

$EDITOR (nano, vim, etc) opens up again. There is a default commit message; you can leave it as-is, or (better) explain why you're reverting the change. For the sake of simplicity, let's leave the default. Save the message, and quit the editor.

You'll see:

Finished one revert.
[master 37ce99f] Revert "add debian/"
0 files changed, 0 insertions(+), 0 deletions(-)
delete mode 100644 debian/.gitignore

This means: a revert is a commit too. With a Committer, an Author, a Timestamp, and a Hash. Ideally, you can revert a revert. (you shouldn't do it :) )

NOTE: Reverting a merge is not that easy. You should also specify the mainline parent. Read about the -m switch of git revert. It also has some nasty (IMHO) side-effects, so don't merge if you're not absolutely sure.

Distributed Workflow

Git is a distributed VCS, so this is a fundamental part. This lets you share your work, and your development, with other people. So, let's put this repository apart.

Cloning + Pushing

I prepared an online repository of the GNU Hello source we're using. This is usually what you'll find for existing projects: an online repository. You can copy its contents, i.e. clone it, with the command git clone. So, let's clone the repository (ah, before that, get out of hello-2.6/):

$ git clone git://gitorious.org/debian-women/hello.git

You'll see something like:

Cloning into hello...
remote: Counting objects: 263, done.
remote: Compressing objects:  33% (54/16Receiving objects:   6% (16/263), 52.00 remote: Compressing objects:  50% Receiving objects:   9% (24/263), 108.00 KiB |remote: Compressing objReceiving objects:  10% (27/263), 108.00 KiB | 93 KiB/s  remote: Compressing objects: 100% (161/161), done.
remote: Total 263 (delta 101), reused 263 (delta 101)
Receiving objects: 100% (263/263), 626.93 KiB | 120 KiB/s, done.
Resolving deltas: 100% (101/101), done.

Now, enter hello/, and poke around a bit: git log, git show...

It's a clean repository, but you got it from the web. If it were a real repository, it wouldn't probably show just one commit, nor just one branch.

Now, let's see where we got this repository from. You can get/set info for the place where you got this repository with git remote. There is a default remote, it is called origin. It is also the default one where pushes will go. Let's see it:

$ git remote show origin

Currently, we only care about these two lines:

Fetch URL: git://gitorious.org/debian-women/hello.git
Push  URL: git://gitorious.org/debian-women/hello.git

It means that we're syncing from the Fetch URL, and are pushing back to the Push URL. These don't need to coincide, they can be different. Like, say, if you're keeping a patched version of some software somewhere: you'd fetch from your upstream, and push to your own location.

We can also add a remote without removing our previous work. For that, we will use git remote add. The syntax is like:

git remote add <remote_name> <remote_url>

Usually, if you want to be able to push back to the repository, you'll need to use a git+ssh:// or ssh:// url. A git://git.[...] usually doesn't permit pushes. (TBH, I've never seen one that permits them, but better say usually don't than never do).

git remote add is especially useful when you're creating a brand new repository, i.e. you're not cloning from anywhere.

Questions and Answers

QUESTION: Does the commit's tree object contain the whole project's tree or only changed files?

ANSWER: A commit's tree object represents how the whole repository looked like at a certain point in time; so yes, it contains references to the whole projec tree.

QUESTION: Files go to index by doing: git commit?

ANSWER: No, git commit creates a commit and puts it in the repository. I would've talked of it later, but this image explains it well, I think.

QUESTION: If I copy all files without the .git directory I lose all commit history?

ANSWER: Yes, the .git directory is the one keeping all the history, the commits, everything. And one just needs a .git directory to recreate a repository. .

QUESTION: Is the index in the .git directory?

ANSWER: Yes, it's kept there.

QUESTION: push will fail if there are some conflicts? if so then this cons can be prons, because shared repository (if it used) will always at working status

ANSWER: Yes, the push will fail. I said earlier that the higher number of conflicts was a con of distributed VCS's, but git will let you solve the conflict and re-push.

About the "shared repository will always work" bit: also centralized VCS's solved conflicts, but they did so automatically (ops). This could lead to wrong fixes -- git, instead, is a "stupid" content tracker, so asks for your help any time it needs it. So, while SVN would automatically fix the conflict for you (but you don't really know for sure what will end up in the repository), in git the push fails, and you deal with the conflict locally (and then re-push).

QUESTION: And one just needs a .git directory to recreate repository << whats the command invoked to recreate it?

ANSWER: You just git clone from that directory. Let's say, you have a .git/ of some project -- just rename it to project.git, and then "git clone project.git" -- you'll end up with a project/ directory with everything in. The rename is made to make everything easier, you could just do git clone .git myotherdir (and everything will end up in myotherdir/)

QUESTION: how can I get current setting of global username and email? (I just want to know setting before I change it :) ) ~/.gitconfig ?

ANSWER: To get a specific value, you can use git config --get. So, git config --global --get user.name. You can also list all values with git config -l and you can edit them with git config, or opening an editor on ~/.gitconfig or ./.git/config (whether you want to edit the global or local configuration)

QUESTION: Can I patch the value of user.name, etc into the files managed by git?

ANSWER: Yes, you can have separate config per repo, just skip the --global. But, the question opens up for other replies: I think the question is about git filter-branch. The git committer is embedded in the Commit object (remember, I said it had metadata too). So, when you change committer, you can't keep the same object hash, and it must change. So, advanced things like filter-branch (who let you rewrite the history of a git repository), shouldn't EVER be made on a public repository, because that will mean tons of conflicts, and headaches.

However, it's interesting to see how one can temporarily override the configured user-name and user-email. git commit can read some environment variables: GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL, GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL. These, if set, will override what's in the git config. In general, for any configuration variable, git will check, in order:

~/.gitconfig -> ./.git/config -> an environment variable, if it exists

So, if you set something in ./.git/config , it will override the global one. If you set a GIT_* environment variable, it will override anything else also.

./.git/config extends ~/.gitconfig. That means that one doesn't have to copy over all the content from the global configuration.

QUESTION: How can I get list of files that have been changed? git log just shows author, timestamp, comment and I don't need content of files (I need just list of them).

ANSWER: The list of files changed within each commit can be seen with git log --raw. Every git command has lots of options, and manpages are usually our friends. For example, it would be possible to combine git log with --raw and --pretty, to get some nicer output than just --raw

The same for git show: we can just pass a formatstring to --pretty

This isn't "basic usage", however. Or use a GUI, I use gitg too sometimes.

QUESTION: Does git preserve file modes of the managed files or do we need work arounds?

ANSWER: It preserves file modes, and it also notices when you change them i.e. a commit could also only consists of a filemode change

QUESTION: git show shows a huge file. Any comment about it?

ANSWER: This is because it's the first commit, where we imported everything. Usually, it's better to do "atomic" commits (I'd say this is generally good practice). So usually you won't see that huge output and, anyways, it's using less by default, so you shouldn't have much problems with it :)

QUESTION: Can i just copy that folder to my webserver and let other people clone it with git clone http://myserver/hello-2.6 ?

ANSWER: Yes, you can. If you don't need the actual files in that directory, it's usually better to just share the .git directory, usually called like project.git. This is called a "bare repository".

QUESTION: Is there some way to simplify: git add + git commit -m ? or it just works one by one?

ANSWER: Yes, there is. If you just want to commit all files, just use the -a switch of git commit. So, git commit -a -m "Message" will just commit everything currently tracked.

QUESTION: One thing that seems alway strange to me is to have the repository as a part of the workspace. Is it easy to have the repo in a different directory tree? Or do I need additional repos for that?

ANSWER: You can, but this is a bit advanced, I'd say. Read the git-config manpage about core.worktree.

QUESTION: e.g I have web application under git and I want deploy it but w/o .git directory. What's better way to do it?

ANSWER: The same answer to the one before: explicitely setting core.worktree in ./.git/config to a different path.

QUESTION: I've created empty directory (test1) but git status show nothing, than I did echo test > test1/testfile and output of git status shows me # test1/ but there is no testfile. Is it expected behavior?

ANSWER: Yes, it's expected behaviour, as said before, we touched an empty .gitignore inside the empty directory to make git track it.

QUESTION: What happens if you revert a change in a branch that isn't the one that you currently have checked out?

ANSWER: It won't happen anything, i.e. the revert won't happen

QUESTION: If i got a branch, and i merge it, but want to revert, how do i find out the hash?

ANSWER: git log is the solution.

QUESTION: clone option, just clone master branch... or maybe others?

ANSWER: git clone will only clone the master branch. I mean, the other branches will be *fetched*, but no local branch will be created for them. To fix this, I usually do the following for each branch I'm interested in:

$ git checkout -b mybranch -t origin/mybranch

(-t origin/mybranch means "track mybranch from origin")

QUESTION: is there differences between git+ssh:// and git:// ?

ANSWER: Yes, there are. git:// is a "dumb protocol", which doesn't support (AFAIK) authentication. So, in the first case, this protocol will be encapsulated in SSH -- much like svn+ssh:// or cvs+ssh:// or others. In the second case, you're using the git protocol directly, i.e. without auth support.

QUESTION: There is a correction in git+ssh question, it was opposed to ssh.

ANSWER: Yes, there's difference also there. I don't have a usecase handy, sorry. Apart from technicalities about the protocol used and what the server supports. I can't think of any difference from a "user" point of view.

QUESTION: How limited is the local git repo cloned with --depth 1, will I still be able to switch between branches and will it age to a less limited repo with time and git pull's ?

ANSWER: --depth will specify how much history to get from a repository. This means that, for example, you're interested in only the recent history of a large project and, it also has limitations: for example, you can't clone it, nor push from/into it. I must be honest, I haven't ever used --depth. I only read about it at the beginning, when I started "studying" git.

See also


CategoryGit