Reading through all the conversations on p5p about what the new Perl 5 repository should be, I've noticed a definite split into two camps. And its not SVN vs git. Its more profound then just what VCS you prefer. Its what development model you prefer. It boils down to this: Is branching something to be feared or embraced? Which camp you're in largely comes from what style of version control you're used to: traditional or distributed.
To the user of traditional version control systems -- RCS, CVS and SVN -- branching is something to be feared. Or, as Jesse rightly corrected me, *merging* is something to be feared. SVN made branching easy as pie but didn't deal with merging (something I hear they're working on in 1.5). Branches are necessary evils. They are short lived and closed before they drift too far from the trunk to make the merge a nightmare. Everyone must commit to the trunk and there is a single master version.
To the user of the new breed of distributed version control systems -- darcs, arch, git, SVK... pretty much every new VCS since Subversion -- branching and merging is something to be embraced. These new systems looked at the problems of branching and merging and tackled it head on in order to decentralize the development process. Decentralization does not necessarily mean "everyone has their own toy repository and releases their own toy distribution and oh god what a coordination nightmare" which is the usual read when people first hear about distributed version control. Rather it means that not everything has to happen in the trunk, in one great brittle revision.
But this model is not chosen because its a good thing, its chosen because the tools make the alternative so horrid. Its been going on for so long that for most it has been drilled into their heads that branching is to be feared because branching is bad. But branching is not bad, the tools are weak.
Think about any other aspect of your software development model and the drive is towards decentralization. We encapsulate our code into separate modules and then deal with the integration issues, because its better than the alternative of shoving everything into one file. Server management? We have per-developer sandbox servers, testing servers, staging servers and production servers. Any production house that pushed their changes directly into production is laughed at. Software design? There's no longer a single architect writing software design documents for mere programmers to implement, its a collaborative, iterative process broken into chunks and spread out
amongst the whole team.
Let's talk about the great buggaboo of traditional version control: keeping a branch up to date with trunk. You make a branch. You commit some work. Meanwhile, people are committing to trunk. You want to update your branch with those changes, how? And once you've updated your branch once, what if you want to do it again later? Traditional version control do a very poor job of this requiring the user to do a lot of the bookkeeping themselves. It rapidly gets messy and its very easy to completely hose the branch or generate piles and piles of spurious conflicts.
For distributed version control, this sort of thing is their bread and butter. "pulling" the latest changes from trunk into a branch is generally one simple command. "pushing" your branch's changes back into trunk is as well. And you can do this as many times as you like, the VCS handles all the book keeping for you. Maintaining a branch is no longer a perilous affair. And that allows you to embrace, rather than fear, branching.
In either world, the developer can work in their own sandbox and then, when they're ready, push their changes back into the trunk. In the traditional model, the sandbox is your checkout and the push is simply a commit. The trouble there is you can't commit until the whole task is done. If its a large change, that means it enter trunk as one giant patch. Difficult to review.
In the distributed model, your branch is your sandbox. You can commit as often as you like, when you like. Do some work, test, log, commit, repeat. This lets you do the work in small, easily reviewed chunks showing the progression of what you're working on. Need to rename a subroutine? Make that change and commit it. Need to move some code? Make it, commit it. Fix a small bug you uncovered? Fix it, commit it. Then when you're all done your changes can be examined a piece at a time, easy to review.
This also allows the developer to work in small, iterative steps. Do a refactoring. Then run the tests, ok they all pass. Commit. Add a new function. Run the tests... wait, they failed. Well, you know everything was working at the last commit, so it must be something you did in the current diff! This vastly narrows down the possibilities when debugging.
Let's say you want someone to review your change before you commit? In a traditional system this means mailing a patch off to a mailing list, something YOU have to do manually. What if you update something? Mail off a new patch. What if you change something and forget to resend a new patch? Well now everyone's looking at an old patch, pretty common mistake. And, of course, if its a large patch its difficult to follow what's changed and why.
But if I'm working in a branch that's checked into the master repo (not purely distributed), anyone at any time can look at the work. They can see the complete history, logs and individual diffs. There's no need to manually mail out patches. You can even catch when some work was forgotten by looking to see what branches contain differences against truck.
So that's what it boils down to. We've been trained by our tools for decades to fear branching. This has colored our development models and made them overly centralized. In just the last few years a whole new set of tools has made branching easier so that we can embrace branching. And now, finally, the limitations of our version control system need no longer dictate our development model.
Re:That's an interesting observation.
jdavidb on 2007-07-12T15:53:26
The thing is though, that I have no fear of branching because there's a "recipe" for merging and I don't have to think too much about it other than to record revision numbers in the log.Bingo; I think you just identified a huge chunk of my fear: I barely had merging in CVS down, and I don't yet fully understand the "recipe" for it in Subversion. So, schwern (or anybody), should I spend time getting really comfortable with merging in subversion and then move to SVK, or will SVK make all of that into time wasted?
Re:That's an interesting observation.
schwern on 2007-07-12T20:14:44
Merging in SVN is currently pretty dumb because you have to do all the book-keeping yourself, recording what revisions have already been merged into what branch. Its tricky and its time consuming and its easy to forget and you have better things to do. You should do it once or twice manually just to understand what's involved in keeping a branch up to date and so you can better appreciate never having to do that again. I had to write svn merge tools back when svnmerge sucked. Its not fun.
After that I'd say jump directly to SVK or just use svnmerge.
Re:That's an interesting observation.
jdavidb on 2007-07-12T21:01:19
Thanks for the advice!
:) Re:That's an interesting observation.
schwern on 2007-07-12T19:59:27
there's a "recipe" for merging and I don't have to think too much about it other than to record revision numbers in the log.
You shouldn't have to think about it at all. Its a rote task. Book-keeping. Monkey work. The sort of things computers are very good at. Humans are very bad and prone to make mistakes.
You might want to look at SVK or the Fisher Price version svnmerge. svnmerge does for you what you're currently doing by hand.Also though, I tend to not do any complicated merges because they are too much work (and large potential for me to screw up).
You're well on your way to conquering your fear, this tells me you still fear branching. "Fear", in code terms, translates into "I don't want to do this because I might break something".
Re:That's an interesting observation.
Matts on 2007-07-13T21:15:59
You shouldn't have to think about it at all. Its a rote task. Book-keeping. Monkey work. The sort of things computers are very good at. Humans are very bad and prone to make mistakes.Not quite. There will *always* be cases where a merge has to be manual. It's just not possible to 100% automate merging. SVN could do it better, but it'll never be Monkey Work.
Sounds wonderful, and the future is so bright we gotta wear shades. But I'm just barely getting the hang of subversion now (even though I've been watching it since before it was a 0.1 release). I want to take the jump to SVK at some point. In fact I want my entire home directory on every box I use, at home and at work, to be versioned through SVK. But I'm not there yet, and by the time I am there will hopefully be lots of really good SVK tutorials to choose from to get me there.
I guess I do fear branching, and I'm realizing I have trouble feeling like moving onward to SVK will work because it sounds like it will get too complicated.
Re:Thanks for the writeup
sigzero on 2007-07-17T12:17:09
Tutorial:
http://svkbook.elixus.org/nightly/en/svk-book.htmlRe:Thanks for the writeup
jdavidb on 2007-07-17T16:02:40
Thank you! I did not realize there was already an SVK book.
:)
Re:Should everyone use distributed?
schwern on 2007-07-12T20:11:01
Its more like this: Traditional VCS can support development models A, B and C. Distributed VCS can still support development models A, B and C but also D, E and F.
Just because you use a distributed VCS doesn't mean you have to be distributed, you can continue to have a trunk-centric project if you like. It allows you to choose your development model without being restrained by your tools. The choice is now yours, not your VCS.
That's the essential pro. The con list boils down to "I have to learn a new interface". SVN wins out because so many people are familiar with it, or CVS or RCS, so nobody has to retrain. Most of the distributed VCS -- darcs, arch, git, hq, monotone -- do things rather differently from CVS and SVN so you have to retrain a little. SVK is the exception in that it retains the SVN interface, but it comes at a cost as any compromise will.
Other cons include the relative instability of the distributed version control market. Being so new, its rapidly evolving and there's lots and lots of them out there. Which do you pick? If you pick the flavor-of-the-month will it still be popular next month? Currently git is the thing. Before that it was darcs. Before that it was arch. With traditional version control that risk doesn't exist. You use SVN because its pretty much killed everything else off and nobody's bothering to write new non-distributed VCS anymore. SVK allows you to hedge your bets since it can be used as a client to an existing SVN repository, have your cake and eat it, too.
Re:Should everyone use distributed?
jdavidb on 2007-07-12T23:58:58
SVK is the exception in that it retains the SVN interface, but it comes at a cost as any compromise will.If you have time to elaborate on that it would be most appreciated.
:) SVK allows you to hedge your bets since it can be used as a client to an existing SVN repository, have your cake and eat it, too.Since I'm not yet familiar with the cons I certainly can't say for sure, but it sounds to me like that's going to outweigh any of them.
:)
... not because merging is painful, but because reviewing and integrating anything more than you can produce in a few hours of work is painful.
Many thanks.