2014-12-16
-
Intro, Tools
-
gif from the internets
-
The repository
-
End
About 50 slides. About 50 commands.
Disclaimer: I am just a git user.
version control, svn or cvs for a moment.
-
Git is ubiquitous. Github, bitbucket and other many hosts use it. Many projects use it (Linux, Ruby, Go, Erlang, Rails, Homebrew, ...)
-
Heroku and other PAAS use git push based workflows.
-
Could you imagine something like git, but for writers? The current Pro Git book is written on Github, together with a community with git.
-
Tools like docker borrow their terminology, when they say docker push or pull.
-
Git is used by teams of thousands of people.
-
It takes a milliseconds to create a single repository to take a note.
-
Facebook has had 54G repository at one point (https://twitter.com/feross/status/459259593630433280).
Git did not start as a full version control system:
Git is a content-addressable filesystem. Great. What does that mean?
It means that
at the core of Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
-
tooling is better now
-
IDE-Support, Netbeans 7.4+, vim, ...
Setup
-
vcprompt
, -
git-completion
, -
~/.gitconfig
(example: http://git.io/4EQNZw)
It makes life easier.
How big is an empty SVN repository? About 100k for the server part plus 124k for workdir.
An initial git repository is 56k. Without the hooks (40k) you have:
- 118 bytes of config (can be shortened),
- 23 bytes of HEAD
- 73 bytes of description (can be left blank, only use by GitWeb)
- 240 bytes of exclude example (can be left black)
So essentially, you have a few bytes for empty directories and about 100 bytes in configuration.
$ sqlite3 wd/.svn/wc.db
SQLite version 3.7.12 2012-04-03 19:43:07
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema
CREATE TABLE ACTUAL_NODE ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), ....
CREATE TABLE EXTERNALS ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), ...
CREATE TABLE LOCK ( repos_id INTEGER NOT NULL REFERENCES REPOSITORY (id), ...
CREATE TABLE NODES ( wc_id INTEGER NOT NULL REFERENCES WCROOT (id), ...
CREATE TABLE PRISTINE ( checksum TEXT NOT NULL PRIMARY KEY, ...
CREATE TABLE REPOSITORY ( id INTEGER PRIMARY KEY AUTOINCREMENT, ...
CREATE TABLE WCROOT ( id INTEGER PRIMARY KEY AUTOINCREMENT, ...
CREATE TABLE WC_LOCK ( wc_id INTEGER NOT NULL REFERENCES WCROOT ...
CREATE TABLE WORK_QUEUE ( id INTEGER PRIMARY KEY AUTOINCREMENT, ...
CREATE VIEW NODES_BASE AS SELECT * FROM nodes WHERE op_depth = 0;
CREATE VIEW NODES_CURRENT AS SELECT * FROM nodes AS n ...
CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
CREATE UNIQUE INDEX I_EXTERNALS_DEFINED ON EXTERNALS ...
CREATE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath);
CREATE UNIQUE INDEX I_LOCAL_ABSPATH ON WCROOT (local_abspath);
CREATE INDEX I_NODES_PARENT ON NODES (wc_id, parent_relpath, op_depth);
...
$ tree .git/
.git/
├── HEAD
├── config
├── description
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── prepare-commit-msg.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info/
│ └── pack/
└── refs
├── heads/
└── tags/
$ cat .git/HEAD
ref: refs/heads/master
-
a symbolic references to the branch you are on
-
default branch is master
-
at init-time the
refs/heads/master
file does not exist
$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
repositoryformatversion
-- is for forward compatibility.filemode = true
-- do not ignore executable bitsbare = false
-- has working directorylogallrefupdates
-- Enable the reflog
$ cat .git/info/exclude
# git ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~
Like .gitignore
but not shared.
No server/client distinction. This is a full repo:
$ du -h .git/
40K .git/hooks
4.0K .git/info
0 .git/objects/info
0 .git/objects/pack
0 .git/objects
0 .git/refs/heads
0 .git/refs/tags
0 .git/refs
56K .git/
To understand what is going on, we will create a repository, create some files and commits.
How many Git repositories could you initialize in a second? About 100!
$ time git init
Initialized empty Git repository in ~/tmp/.git/
real 0m0.010s
user 0m0.002s
sys 0m0.007s
$ echo "HELLOWORLD" > README.md
Has anything changed in the repo?
No, there is just an untracked file.
[git:master?] $ git st
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
nothing added to commit but untracked files present (use "git add" to track)
Now add it.
[git:master?] $ git add README.md
[git:master] $ tree .git/
.git/
├── HEAD
├── config
├── description
├── hooks
│ └── ...
├── index
├── info
│ └── exclude
├── objects
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
.git/
.
├── objects
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
. .
Hooray, we have a first object, its full key is
c63053a6310c57fae01ecfde5cdf62d6c31111ea
[git:master] $ git show c63053a6310c57fae01ecfde5cdf62d6c31111ea
HELLOWORLD
It's the content we added! Git calls this a blob.
But, where does this SHA1 comes from?
[git:master] $ git hash-object README.md
c63053a6310c57fae01ecfde5cdf62d6c31111ea
Anyone in this solar system, running
[git:master] $ echo "HELLOWORLD" | git hash-object --stdin
c63053a6310c57fae01ecfde5cdf62d6c31111ea
will get the same key for this exact content.
It's just the sha of some header and the content:
$ python
>>> import hashlib
>>> hashlib.sha1("blob 11\x00HELLOWORLD\n").hexdigest()
'c63053a6310c57fae01ecfde5cdf62d6c31111ea'
What is this
git hash-object
?
It's from git's plumbing.
This is what Pro Git says:
... but because Git was initially a toolkit for a VCS rather than a full user-friendly VCS, it has a bunch of verbs that do low-level work and were designed to be chained together UNIX style or called from scripts. These commands are generally referred to as plumbing commands, and the more user- friendly commands are called porcelain commands.
What is a blob now?
Git knows about four different kinds of object only and blob is the simplest one. It's basically file contents.
Let's see:
[git:master] $ git cat-file -t c63053a6310c57fae01ecfde5cdf62d6c31111ea
blob
It's small. Just 11 bytes.
[git:master] $ git cat-file -s c63053a6310c57fae01ecfde5cdf62d6c31111ea
11
The help page for git cat-file
will gives away the other object types:
[git:master] $ git cat-file -h
usage: git cat-file (-t|-s|...) <object>
...
<type> can be one of: blob, tree, commit, tag
-t show object type
-s show object size
Ok, there are
- blobs,
- trees,
- commits and
- tags.
Nothing more. Let's get back to our repo, and create more objects.
[git:master] $ git ci -m "Add README"
[master (root-commit) ede7e84] Add README
1 file changed, 1 insertion(+)
create mode 100644 README.md
Somewhere in some man page or git books it is recommended that you use the present tense for commit messages.
First line of commit message should not exceed 72/80 chars.
Then a blank line.
Then some detailed explanation, if necessary. People will thank you for following standards.
Let's look at out repo now. Now that you know that there are only blob, tree, commit and tag objects, and that we had a single blob object already, when we added the file, can you guess the number of object we have now? Of which type.
- certainly there will be a - single - commit object
- something else?
[git:master] $ tree .git
.git/
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── hooks
│ └── ...
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ └── master
├── objects
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
└── refs
├── heads
│ └── master
└── tags
c63053a6310c57fae01ecfde5cdf62d6c31111ea
is the blob.
Let's look at c0a176e993d123ff67e7d065fee438e7e3ef7a92
and ede7e847cf64729e721dad8aa778a12b18306806
.
We saw ede7e84
in the commit output before. We could guess, that this is
the commit object. Let's see:
[git:master] $ git cat-file -t ede7e84
commit
Sure enough.
We have a single file, a commit. We haven't created any tags, so
c0a176e993d123ff67e7d065fee438e7e3ef7a92
will probably be a tree:
[git:master] $ git cat-file -t c0a176e993d123ff67e7d065fee438e7e3ef7a92
tree
Easy?
Let's pretty print all of them. Let's start with the blob (again):
[git:master] $ git cat-file -p c63053a6310c57fae01ecfde5cdf62d6c31111ea
HELLOWORLD
Then the tree:
[git:master] $ git cat-file -p c0a176e993d123ff67e7d065fee438e7e3ef7a92
100644 blob c63053a6310c57fae01ecfde5cdf62d6c31111ea README.md
Perms, type, sha, name.
And the commit:
[git:master] $ git cat-file -p ede7e84
tree c0a176e993d123ff67e7d065fee438e7e3ef7a92
author Martin Czygan <martin.czygan@gmail.com> 1418248242 +0100
committer Martin Czygan <martin.czygan@gmail.com> 1418248242 +0100
Add README
Tree, author, committer, message, dates.
[git:master] $ tree .git/
.git/
.
.
├── objects
│ └── ...
└── refs
├── heads
│ └── master
└── tags
We now have a refs/heads/master
file.
What's in there?
[git:master] $ cat .git/refs/heads/master
ede7e847cf64729e721dad8aa778a12b18306806
This is the SHA1 of our first commit.
[git:master] $ git cat-file -p $(cat .git/refs/heads/master)
tree c0a176e993d123ff67e7d065fee438e7e3ef7a92
author Martin Czygan <martin.czygan@gmail.com> 1418248242 +0100
committer Martin Czygan <martin.czygan@gmail.com> 1418248242 +0100
Add README
We would get the same output for each of the following:
[git:master] $ git cat-file -p refs/heads/master
[git:master] $ git cat-file -p master
[git:master] $ git show refs/heads/master
commit ede7e847cf64729e721dad8aa778a12b18306806
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 22:50:42 2014 +0100
Add README
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c63053a
--- /dev/null
+++ b/README.md
@@ -0,0 +1 @@
+HELLOWORLD
show
will format the date, adds colors, a diff.
Or just:
[git:master] $ git show master
...
The HEAD points to the current branch you are on. The branch you are planning to do your next commit on.
[git:master] $ cat .git/HEAD
ref: refs/heads/master
So, one last time...
.git/
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── hooks
│ └── ...
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ └── master
├── objects
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
└── refs
├── heads
│ └── master
└── tags
-
config, boring stuff
-
description, boring stuff
-
info, boring stuff
-
objects, the key-value store
-
refs, branches
We still need to sort out two things, we haven't talked about, that is
-
logs
and -
index
We won't talk about the logs
today (but you can inspect them on you own).
It's some internal bookkeeping.
- sits between the working dir and the object database
- it's the staging area
- everything you want to commit is gathered there
Why, wouldn't I just use a single commit everything I touched and add a commit message like this:
* updated docs
* fixed that little bug
* refactored my git talk
A commit - in the best case - represents a single conceptual change to a project. Not a single file, necessary. This gives a clearer history and makes many other things easier (another talk).
Nothing you change gets committed automatically. In fact you have to add your changes each time you want to commit something.
Why is the index useful?
You can edit what you want in your files, but still keep your commits and your projects history clean.
Can you add partial changes to a file in svn
? I guess you can, somehow (http://stackoverflow.com/q/75809/89391)
Git pro tip:
$ git add --patch
[git:master] $ echo "GIT'S STRANGE" >> README.md
[git:master+] $ git st
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: README.md
no changes added to commit (use "git add" and/or "git commit -a")
There it is:
Changes not staged for commit
Why not? Because of the index. I am under no pressure to touch only those files, that I want to commit. I could even change into a new branch of development at this point, too.
Let's actually do this.
[git:master+] $ git checkout -b getting-subjective
M README.md
Switched to a new branch 'getting-subjective'
Again, let's see how the repo looks like...
[git:getting-subjective+] $ tree .git/
.git/
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── hooks
│ └── ...
├── index
├── info
│ └── exclude
├── logs
│ └── ...
├── objects
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
└── refs
├── heads
│ ├── getting-subjective
│ └── master
└── tags
There is no new object, since we haven't added README.md
yet.
So no new object so far.
But there is a new ref, namely refs/heads/getting-subjective
.
What is in there?
[git:getting-subjective+] $ cat .git/refs/heads/getting-subjective
ede7e847cf64729e721dad8aa778a12b18306806
This is the first commits SHA1.
What do you guess is in HEAD. Rememeber, HEAD points to the name of the branch, which we are currently on:
[git:getting-subjective+] $ cat .git/HEAD
ref: refs/heads/getting-subjective
Sure enough.
Ok. Imagine for a moment, what will happen.
We are on a new branch, because we want to keep some change seperate.
For now. The branch we are on is named getting-subjective
.
The next commit would do the following:
It would record the change to README.md
. It would record
a commit and would move forward the branch pointer (in refs/heads/getting-subjective
)
to the new commit.
We would have three new objects. A new blob object. A new blob object implies a new tree, because the tree contains the SHA1 of the blob. And a new commit object.
Is this true?
We now add the (all the) changes:
[git:getting-subjective+] $ git add README.md
And the tree?
[git:getting-subjective] $ tree .git/
.git/
├── COMMIT_EDITMSG
├── HEAD
.
.
├── objects
│ ├── 48
│ │ └── 21aa9297cfcd420f8efee1d8cef542ff3ca0fd
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
└── refs
├── heads
│ ├── getting-subjective
│ └── master
└── tags
Ah, 4821aa9297cfcd420f8efee1d8cef542ff3ca0fd
is new.
This is no suprise for us anymore:
[git:getting-subjective] $ git cat-file -t 4821aa9297cfcd420f8efee1d8cef542ff3ca0fd
blob
[git:getting-subjective] $ git cat-file -p 4821aa9297cfcd420f8efee1d8cef542ff3ca0fd
HELLOWORLD
GIT'S STRANGE
Now, let's commit that...
[git:getting-subjective] $ git ci -m "Well, that's just your opinion, man."
[getting-subjective d0cdeeb] Well, that's just your opinion, man.
1 file changed, 1 insertion(+)
How many objects do we have now? Correct. 6.
[git:getting-subjective] $ tree .git
.git/
.
├── objects
│ ├── 2c
│ │ └── 19f5ff0194af40c6f38e2f81c4c1e6b9b1832e
│ ├── 48
│ │ └── 21aa9297cfcd420f8efee1d8cef542ff3ca0fd
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── d0
│ │ └── cdeeb0752c14a6ff486ce478b8dc02adc33c68
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
.
[git:getting-subjective] $ cat .git/HEAD
ref: refs/heads/getting-subjective
[git:getting-subjective] $ cat .git/refs/heads/getting-subjective
d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
Or shorter:
[git:getting-subjective] $ git rev-parse HEAD
d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
[git:getting-subjective] $ git log
commit d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 23:39:53 2014 +0100
Well, that's just your opinion, man.
commit ede7e847cf64729e721dad8aa778a12b18306806
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 22:50:42 2014 +0100
Add README
You can see all code changes (patches) with, git log -p
.
What happened in getting-subjective
that did not happened in master
?
[git:getting-subjective] $ git log master..getting-subjective
commit d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 23:39:53 2014 +0100
Well, that's just your opinion, man.
And what happened in master
that did not happened in getting-subjective
?
[git:getting-subjective] $ git log getting-subjective..master
Yes, nothing is correct.
Ok, go back to our master.
[git:getting-subjective] $ git co master
[git:master] $ git log
commit ede7e847cf64729e721dad8aa778a12b18306806
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 22:50:42 2014 +0100
Add README
[git:master] $ cat README.md
HELLOWORLD
HEAD is back at master:
[git:master] $ cat .git/HEAD
ref: refs/heads/master
And master points to the first commit, still.
[git:master] $ cat .git/refs/heads/master
ede7e847cf64729e721dad8aa778a12b18306806
[git:master] $ git merge getting-subjective
Updating ede7e84..d0cdeeb
Fast-forward
README.md | 1 +
1 file changed, 1 insertion(+)
This is a so-called fast-forward.
[git:master] $ git log
commit d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 23:39:53 2014 +0100
Well, that's just your opinion, man.
commit ede7e847cf64729e721dad8aa778a12b18306806
Author: Martin Czygan <martin.czygan@gmail.com>
Date: Wed Dec 10 22:50:42 2014 +0100
Add README
Are there any new objects? Look at the output, what do you think?
No. Because we fast-forwarded.
[git:master] $ tree .git
.git/
.
├── objects
│ ├── 2c
│ │ └── 19f5ff0194af40c6f38e2f81c4c1e6b9b1832e
│ ├── 48
│ │ └── 21aa9297cfcd420f8efee1d8cef542ff3ca0fd
│ ├── c0
│ │ └── a176e993d123ff67e7d065fee438e7e3ef7a92
│ ├── c6
│ │ └── 3053a6310c57fae01ecfde5cdf62d6c31111ea
│ ├── d0
│ │ └── cdeeb0752c14a6ff486ce478b8dc02adc33c68
│ ├── ed
│ │ └── e7e847cf64729e721dad8aa778a12b18306806
│ ├── info
│ └── pack
.
Still only 6 objects. But refs/heads/master
changed.
[git:master] $ cat .git/refs/heads/master
d0cdeeb0752c14a6ff486ce478b8dc02adc33c68
Git has simple internals.
There are only four objects. For a simple workflow, you don't even need tags.
Many things will fall out of this.
Many things will fall out of this.
What are these objects?
$ git clone git@github.com:torvalds/linux
Cloning into 'linux'...
remote: Counting objects: 3927483, done.
...
If you git fetch
, git fetches the objects and the references.
Once you fetched a remote, you can run things like:
$ git log master..origin/master
More things will fall out of this.
Things like reset
and rebase
are just tools to manipulate
the DAG.
$ git reset --hard origin/master
Not necessery trivial, but not mysterious either.
$ git rebase -i a64f821
But there are still many things to cover.
-
What is
git reset
really? -
Github hype? Collaboration? Tarballs and patches?
-
Fixing history (reset, cherry-pick)
-
Merge strategies
-
Remotes