Dear git-using lazyweb...

nicholas on 2009-08-09T18:02:23

Dear lazyweb,

Say one has a git repository, and one would like to produce bi-weekly e-mails summarising activity...

As I understand it, branches in git, whilst first class, don't store history. In that, right now, there is a well defined "head" for blead, but there is no state that tells me what the "head" for blead was yesterday, or even last week. So, if I want my summary to say "activity on blead", and "activity on maint-5.10", "activity on target-the-SuperCollider-VM" etc., I don't have any way from a single current repository to infer the history of the branches.

So, I'm wondering, what happens if I conceptually have two repositories. One current at half-a-week-ago, and one current at right-now. Can I usefully tease apart some semblance of activity on each branch between "then" and "now"? I think that I can - I know the position of blead's head back then, and I know the positions of blead's head right now, so any commit that is both a parent of "head now", and a child of "head then", is somewhere on the web of commits assignable to the branch blead. (Of course, if the web is particularly tangled with merges and branching, it might also qualify by the same means as being on the head of maint-5.10 both then and now. But I can choose an iteration order on branches, and report each commit exactly once on the first position I try it.) And then, for each branch "now", seek out all commits that are a parent of it that are not yet reported. And then, for each branch "then", do likewise. And finally, seek out any orphan commits.

But I wonder - do I actually need two repositories? Because I know that when git itself pulls from the remote, it only transfers the commit objects it doesn't yet have. So is there a way to either do a "dry run" first to get the full list of commit objects? Or a way to determine when commit objects were added to the repository, for example by the time stamp of the file on disk? That way I only need one repository, which feels cleaner.

Or is this already a solved problem? git shortlog isn't the solution, because it groups by author, whereas I want grouping first by branch, and then by time.

This post brought to you by the campaign to give SuperCollider programming the recognition it deserves.


branches are points, not lines

merlyn on 2009-08-09T19:23:06

Git is all about the commits. Commits have parentage. Commits are shared. And some commits have human-friendly names. Tags are one kind of names, and are semi-static. Branches are a different kind of name, but generally move over time. Also, branches are repo-local, where tags are generally shared.

I think if you define carefully what you want to know, there are lots of options on "git log" that will give you what you want. But in a proper repo, branches are generally rather ephemeral (provided you are using topic branches properly), so the "history of a branch" isn't all that useful.

easy peasy

rjbs on 2009-08-09T19:27:29

I find it helps to not talk about the "branch" master but rather the "head" master. When you separate the fact that heads can cause branching from their essence, you can think of more ways to use git.

So, make a new head called "last-summary." Start it at commit x, presumably the current location of master.

In two weeks, get the output of `git log last-summary..master` and you will see all the changes that are between the two named commits. With that summary generated, you can now move last-summary to the same place as master. master will keep moving, while last-summary serves as a bookmark.

This is one of the many ways that heads are useful: they're mobile tags.

Even though the others make good points…

Aristotle on 2009-08-10T03:39:38

… I think you still want to look into git help reflog.