git: you're explaining it wrong

rjbs on 2008-01-25T15:38:27

I switched from Subversion to Git entirely, shortly after YAPC last year. Now I only use svn for work and other people's projects, and even that is because I'm just way too lazy to bother witching to git-svn just yet.

Git is a really great tool, and I have enjoyed using it much more than Subversion. By this, what I really mean is that it's generally gotten in my way a lot less while still providing even more tools for getting real work done. I know that it's all sorts of different under the hood, and I even understand some of them. This article about Git was interesting as lazy reading to get a bit more information about what's going on under the hood.

It's not an introduction to using git, though. I have seen quite a few introductions to it that look something like this explanation that Git is all about content and not metadata. I'm not sure who the intended audience is for this post, but I hope it isn't "people considering switching to Git."

I see a lot of people saying, "I tried to move to Git, but it was too confusing." I think the confusion must be coming from the hype, and not so much the product. Sure, Git can be really confusing, when you get into its weird features that come from its magical moon code, but it makes a very simple replacement for a lot of CVS or Subversion use cases. The problem is that once you've become part of the Git Mob, you become so excited by true decentralization, content-addressability, and other forms of advanced mtfnpy technologies that you can no longer be bothered to explain how to just check in a damned file change.

Fortunately, I have not yet tried to deal with any of the git commands that are rated GURPS Tech Level 8 or above, so I still bellyfeel Subversionthink.

Sam Vilain, who is way cool, wrote an introduction to git-svn for Subversion/SVK users, and it's very good, especially at explaining why Git is a good tool. There's also Git's own Git - SVN Crash Course, which isn't bad, but it does some weird things like put tagging and branching under one header (even though in Git they are, as is proper, two distinct things).

My point is this: if you have your own code or documents under source control, git is a very good tool to consider. It's very, very fast. It's very easy to use. It comes with a lot of nice tools (like, say, gitk). If that sounds nice, don't get distracted listening to how it will revolutionize your conception of source code control by eliminating metadata-based delta-logarithm concatenation sets. Just run git init and start committing. The two documents above are decent minimal introductions. Maybe I'll do a lightning talk on Git For Morons this year.


thoughts on git

dagolden on 2008-01-25T17:01:02

I've looked at switching/using git at least twice in the last six months or so. Lately, it seems that native Win32 support is improving so that hurdle I had is dropping.

I get most of it, conceptually, but I find some parts of the application of it confusing. It desperately needs more cookbooks, more examples of "here's how I work with it day to day" and not just by Linux kernel developers who are (a) highly-technical already and (b) managing a vast, decentralized project with their own particular legacy of development patterns.

Here's a sample of things I wonder about, if you want to take a shot at addressing them:

  • Where do I publish if I don't run my own public server? I've seen one big public service (repo.or.cz) which seems like a big single point of failure. And if 10 of us are collaborating on, say project-wibble, is there supposed to be project-wibble-dagolden.git, project-wibble-rjbs.git, etc. there and elsewhere and we all just keep track of where everything is?
  • Why is it standard command the tutorials have for commiting "git commit -a"? Why the "-a"? Or, rather, if that's the most common thing, why isn't it the default? If something else is supposed to be the common case, could someone explain how I use it practically in day-to-day development? I can follow the recipe that that doesn't mean I really know what I'm doing.
  • There seem to be too many rules of thumb and too much interaction with config files. Don't fetch this or you'll break things. If you push, you have to edit the config a certain way. Don't push to repositories with working files. Generally, there seems to be more than enough rope to hang myself with.

-- dagolden

Re:thoughts on git

rjbs on 2008-01-25T18:16:35

Yeah, it's very easy to do public git, but not well explained. If you have any webdav it's trivial. I'll see if I can put together an example, sometime, but basically you can just rsync your repo to webdav and let people pull from it.

I don't know why git commit doesn't have -a as default. I made an alias for "git ci" to "git commit -a" and that's that.

I don't find that I touch config files very often, after my initial setup, which was really not much: set up some aliases, set up my username. Done.

Re:thoughts on git

Aristotle on 2008-01-26T20:05:39

The “-a” switch is because git has an indirection between the working copy and the repository: the index, ie. a commit-ready snapshot of the tree. You don’t commit the state of the working copy directly; you copy things from the working copy to the index (this is called staging), and then commit the index. If you make a change in the working copy but you do not stage it, a plain commit will not contain the change from the working copy. Explicit staging is done using git-add.

The “-a” switch to git-commit means “if there is any difference between files in the working copy at paths tracked by index, and those paths in the index, then automatically stage the working copy files before committing.”

I admit that having typed it out, this seems like pointless hoopla. I also remember that when I first read about git it seemed like pointless hoopla to me too. And yes, maybe the “-a” should be the default. Or should it? After having used git for a while, I find this indirection is not actually an ordeal worth mentioning at all. In fact it has grown on me: it’s easy to compose the commit instead of committing the exact state of the working copy, so I feel much less of a need for deliberation. I'm no longer constantly planning ahead for my commits while making changes.

Re:thoughts on git

dagolden on 2008-01-26T20:35:26

I wonder if that winds up being more helpful when working across tons of tiny C files across a large project -- staging in changes file by file as you edit and then when all of them are done making one big commit to the repository.

I suppose I find myself doing something similar with SVK. Small commits to my local repo as I make changes and then pushing the whole thing (often lumped together) up the repository.

--dagolden

Re:thoughts on git

rjbs on 2008-01-26T21:10:26

Your explanation made my eyes start to glaze. :) Until the end, it's just begging the question, but with some jargon. That's a bit of the problem I saw, when posting. Of course, I know I do the same thing all the time, and I see it in all places. I think Schwern just posted something about it with p5p: people say, "It's that way because $something_stupid." Well... so... is that a good reason?

I think I'd be more interested in hearing you (or other git users) expound on whether the -a is a good or bad thing. The reasons that it exists are pretty obvious: that's how you stage commits.

I think it's quite likely that I'm missing out by always using -a (via my alias), but I'm not sure. I'm a Git Moron.

Re:thoughts on git

Aristotle on 2008-01-26T22:19:02

The reasons that it exists are pretty obvious: that’s how you stage commits.

Sorry, but now it’s you making assumptions. When someone who has only ever seen tutorials asks “why the -a,” I cannot presume that they know git makes a distinction between the working copy and what gets committed. So I can’t talk about its value before laying out the facts, which resulted in two paragraphs of expository prelude.

And indeed, judging by David’s reply, he appreciated the tack I took to answering his question. So with all due respect for Schwern, I think I made the right decision.

I think I’d be more interested in hearing you (or other git users) expound on whether the -a is a good or bad thing.

I don’t know that I can make such a judgment call. Is it good? Is it bad? All I can say is: it is. And maybe: it’s different. Whether it gives you anything worthwhile depends on your particular psychology, much like your choice of text editor, say. What it is is an extra freedom to decide what history will look like. I appreciate greatly, particularly after coming from Subversion, where history tends to be Set In Stone, a fact that always chafed me.

But if you’re less particular than me about what’s in your commits, then that level of control may well feel superfluous or even burdensome to you, and you may be best served by your alias.

It’s there if you like it and you can opt out if not.

Re:thoughts on git

rjbs on 2008-01-27T01:44:40

What I meant was: I'd be more interested in hearing your opinion on whether it's the right design decision (-a for all, rather than, say, -f for certain files) based on your usage. That is: if it implies that you should rarely use -a because you should be using git add, do you personally think that's a great way to work?

Re:thoughts on git

Aristotle on 2008-01-27T03:50:41

Hmm. Honestly, I’m not trying to be ornery, I just can’t give a straight good/bad answer. It’s highly dependent on workflows.

In my case, I do “foo status” quite frequently, and always when I’m poised to “foo commit.”

When foo is git, the git-status is an opportunity to git-add the files I think I’m done editing during for this particular set of changes. Sometimes while I continue in other files, I go back and make further changes in files I’ve already staged, because something occurred to me along the way. I don’t need to make a mental note then – I can just make the change and be sure that my commit won’t be contaminated. In that case I commit without “-a.”

Other times I make a series of very small changes, each committed immediately; then “git commit -a” is more convenient.

By and large, I’d say I tend to commit without “-a” more often than with, because it’s dead easy to stage missing things and “git commit --amend”, whereas it takes lots of care and double-checking to surgically slice things out of the working copy and then stage them in order to amend a contaminated commit.

In contrast to all that, when foo is svn, I have to be a lot more careful about committing the specific files I touched (compare with git, where my intermittent git-adds build the list and state of files to commit step by step), and sometimes I have to go to comical lengths involving diff, patch, svn revert and editing the resulting diff in order to get the working copy in just the right condition for the commit.

So for me, having the working copy uncoupled from the repository through the index is a godsend: I can put together coherent changesets without it breaking me out of the flow by demanding that I postpone all changes to a file until after commit.

Is this a superior way of working? Beats me. I know it suits me a lot better than Subversion did. I don’t know if this is the way that everyone should work. I can only explain why someone might appreciate it. Git is a bit like Perl here: it’s not religious about forcing you into one particular paradigm. TMTWOWTDI.

Re:thoughts on git

rjbs on 2008-01-27T13:10:18

The more I think about it, the more I think that my use of commit -a is likely to remain my way of working. I try to make changes in one related set, so committing it all at once is simplest. When I make weird one-off changes while working on something else, I end up committing it by hand by specifying its path on the command line.

I think the (in my experience) underhyped command that I need to use more often for having multiple sets of things changing is git-stash. I should try to use it more.

Re:thoughts on git

ask on 2008-01-28T12:21:03

I use "-a" for small changes that I just want to push in and individual commit or add, add, add, commit steps if I have a bunch of changes in progress or want to split a bigger change into smaller commits.

  - ask

Re:thoughts on git

jdavidb on 2008-04-07T20:58:42

I just learned all this today, so take it with a grain of salt, but to me having -a exist and that feature not be the default is exactly how I want to work. I have my working copy, with tons of changes I want to keep around but ignore for the moment, and then I want to go through 25 changed files and build up a list of changes to commit. Then I don't want to botch it all by typing "cvs commit -m'commit message' [oops I forgot to list my files!]" . As a total git newbie, I think this is great design, at least for the way I work.

Re:thoughts on git

grantm on 2008-01-26T21:43:43

  • Where do I publish if I don't run my own public server?

Git supports a number of different transports, but the simplest way to share is to put a git repository in a publicly accessible web directory (just a plain ordinary web server not a 'git server'). You would typically push to it using git over ssh or by rsyncing your repository directory to the server. Other people would pull from it using git over HTTP. They obviously can't commit directly, but they can send you patches, or they can publish their own repository containing a modified branch and you can pull/merge from there.

In a work environment as a replacement for CVS/subversion, the easiest thing is to push/pull over SSH to a central repository.

  • I've seen one big public service (repo.or.cz) which seems like a big single point of failure.

When using git, each user always has their own repository on the machine where they are working (the '.git' directory at the top of the project 'checkout'). They can push/pull to other repositories but each repository has a copy of all the data required to preserve the history and integrity of each commit. Far from being a single point of failure, the central repository is just one of many replicas.

  • And if 10 of us are collaborating on, say project-wibble, is there supposed to be project-wibble-dagolden.git, project-wibble-rjbs.git, etc. there and elsewhere and we all just keep track of where everything is?

Remember, each working copy is a repository. The simple scenario is that there is one shared repository that all team members can push to/pull from.

  • Why is it standard command the tutorials have for commiting "git commit -a"?

Because those tutorials are wrong :-) Seriously though, best practise is to commit regularly in fine-grained chunks - this leads to painless merging. If you're in the habit of doing one big commit at the end of the day then I guess -a is a replacement for that but I wouldn't recommend it.

  • Why the "-a"? Or, rather, if that's the most common thing, why isn't it the default? If something else is supposed to be the common case, could someone explain how I use it practically in day-to-day development? I can follow the recipe that that doesn't mean I really know what I'm doing.

The '-a' simply means that the commit should include all modified files. The recipe I use more often is:

git-commit -m "commit message" file1 file2 file3

I typically cut/paste the filenames from the output of git-status or use tab completion. Other people select the filenames in the standard git GUI. Of course in the event that my commit should include all files then I'd use '-a' but that's seldom the case for me.

  • There seem to be too many rules of thumb and too much interaction with config files. Don't fetch this or you'll break things. If you push, you have to edit the config a certain way. Don't push to repositories with working files. Generally, there seems to be more than enough rope to hang myself with.

I wouldn't disagree with you there. But that's arguably an inevitable side effect of the power and flexibility that git provides. Often people say "don't do X" because they did it one time and got a result they didn't expect. Over time, you get to understand what the different approaches are and when you'd chose one over another, then you get fewer unexpected results.

Although git provides a mind boggling array of commands, you really only need a small number of them to replace typical CVS/Subversion workflows.

Re:thoughts on git

Aristotle on 2008-01-27T04:27:34

I think the confusion of novices right now is also because DVCS have only just gone mainstream. In contrast to CVS or Subversion, almost everyone is only just figuring this stuff out, so novices don’t have any firm guidance – we’re all making this up as we go.

hosted git repository cookbook

dagolden on 2008-01-26T23:05:28

After complaining about the lack of cookbooks, I should lead by example and try to let others benefit from the time I spent figuring things out.

I'm not sure if this will convert me, but at least it isn't so damn painful to experiment[1] with anymore.

-- dagolden

[1] By "experiment" I mean not just controlling my own files but seeing how I would collaborate with others. Pull, push and all that.