another unproductive complain about subversion

rjbs on 2008-10-15T02:18:33

I remember in 2005 or so when I first started using Subversion, I liked it so much. It was much easier to use than CVS. Everyone said it would be make tagging and branching easier than CVS. In CVS, tagging was fine, but branching was such a pain that I never bothered.

Eventually, I found out that branching and merging were much easier, but still a real pain. Tagging, though, was completely insane. Tags were implemented as copies (just like branches). This sort of made sense as a cheap way for branches to work, but none for tags. Tags are labels for points in time in a repository. They shouldn't be mutable, unless maybe to let you remove a label from rev 1 and put it on rev 2.

Because they're implemented as copies, you can actually go in and alter the state of a tag, meaning that tags are useless as ... tags. It also means that if you have a standard Subversion repository with trunk, branches, and tags directories, and you check out the whole thing, you check out absolutely every file in every revision. "Copies are cheap" was a big Subversion mantra back in the day, because in the repository, only files that changed were new files on disk -- but that only goes for what's in the repository, not your checkout. In your checkout, every copy of readme.txt is its own file -- and it has to be, because even the tags are mutable. You can't say that ./tags/1.000/readme.txt is the same file as ./tags/2.000/readme.txt just because there was no change between the two releases, because you could go change either of them, and if you do, you'd change both. Oops!

This came up today because of a piece of automated deployment code that did something like this:

$ mkdir TEMP
$ chdir TEMP
$ svn co $REPO/project
$ cd project/trunk
$ bump-perl-version
$ cd ..
$ svn cp trunk tags/$NEW_VERSION
$ svn ci -m "bump and tag $NEW_VERSION"

Checking out requires a whole lot of space, because it has to check out every single tag's copy of every file. Tagging the new release is also fairly space hungry. How hungry? Well... the project I'm working on right now is a web application. Let's call it New-Webapp.

If I export a copy of trunk from Subversion, getting just the files that make up the latest version of the application, it's 1.9 megabytes.

If I check a copy of the trunk out, so now there's all the extra working copy files, and it's 5.2 megabytes.

If I check out the whole repo, getting every tag and branch (for your information, there is exactly one branch), it's 207 megabytes.

Now, keep in mind that this gives me every file from every tagged release (there are 40 releases). This does not give me the entire revision history. There are many, many revisions missing. After all, what I have is basically 42 revisions: 40 releases, trunk, and one branch. That's it.

If I use git-svn to build a git repository of the project, meaning that I have absolutely every revision, every tag, and every branch, it's 249 megabytes. That's all 1149 revisions.

I am so ready to be done with Subversion.


Is it really a problem?

autarch on 2008-10-15T03:25:13

I agree, allowing tags to be mutable is silly, but has that actually been a problem for you in practice. I don't recall ever actually modifying the contents of stuff under a tag, and you can always revert it.

As far as the checkout thing goes, that'd be annoying, but I've never done that either. When I want to make a tag I always operate directly on the repo URI:

svn cp http://.../svn/Foo/trunk http://.../svn/Foo/tags/2.0

Yeah, distributed VCS is probably better in lots of ways, but Subversion works well enough in most cases, and many of the faults people point out seem to be conceptual rather than real.

For me, the biggest thing I've always hated about svn was its lack of merge tracking, which made using more than trivial branches (ones that might need to merge multiple times) impossible. That's why I first start using svk.

Nowadays, I'm playing with Mercurial, and I like it. I'm still not sure how it'd play out for a project where we wanted some sort of centralized-ish development (like say, commit emails even on branches, so we can do code review).

Re:Is it really a problem?

rjbs on 2008-10-15T03:58:01

I updated the code to tag directly in the repo. The obnoxious thing is that now there are two operations: update the trunk with the new release iformation, then tag. It is not atomic. I have to specify what changeset to copy into the tag.

Re:Is it really a problem?

autarch on 2008-10-15T04:25:22

Uh, shouldn't you check in the "bump version" change to trunk _before_ you tag said version?

Re:Is it really a problem?

perrin on 2008-10-15T04:44:58

I use svnmerge.py, which is a lot like the merge tracking in svk.

I don't really see the problem with tags. I don't edit them, and I don't ever check out from root, so I don't download any files I don't want.

My only svn problem that I haven't found a simple solution for is the slowdown when your repo gets really huge. I expect git would handle this better, but selective update/commit commands make it workable for me, just not as fast as I want it.

Re:Is it really a problem?

jplindstrom on 2008-10-15T10:12:49

Would you know whether that is "huge" in files, or in revisions, or both?

I'm arguing against putting many different (unrelated) projects in the same svn repo, and this would be another reason to not do that.

Re:Is it really a problem?

perrin on 2008-10-15T16:09:16

I think it's just a lot of files.

Re:Is it really a problem?

Shlomi Fish on 2008-10-15T17:07:22

I agree. The subversion tagging concept is perfectly acceptable because you can always revert them (using "svn revert" or "svn copy -r"), and changing the contents of the tags is unlikely. There are three similar issues with Perl:

  1. using my/our for constants - that's what I use most of the time, because it gives me most of the benefits of Readonly that Damian mentions in his book. Again, I might change them, but it's not very likely, so it doesn't matter
  2. Using a leading underscore for private methods ("_my_private_method"). Again, the added benefit of having true encapsulation and permissions in Perl's OO is usually not worth it.
  3. Inside-out objects: again - protecting the programmer from himself. Too much trouble for too little (if not negative gain).

So I say the Subversion tags concept is perfectly fine. You can always use repository-wide permissions to further protect it if you care about it so much.

And a question to rjbs - I still don't understand why you want to check out the entire Subversion repository. This is usually an indication of poor design. If you want to switch from one branch/tag/trunk to another you can always use "svn switch".

Re:Is it really a problem?

rjbs on 2008-10-15T17:28:01

And a question to rjbs - I still don't understand why you want to check out the entire Subversion repository. This is usually an indication of poor design. If you want to switch from one branch/tag/trunk to another you can always use "svn switch".

The code all gets checked out to get an atomic commit-and-tag. Without that, you have two transactions.

Yes, that's work-around-able.

It doesn't change the fact that Subversion is a pig or that Subversion's tags are, at best, tolerable. "a tag is a copy" is workable, sure, if you are not crazy and likely to change it. That doesn't change the fact that it's much more annoying than it needs to be. If a tag was name for a revision number, I could check out my "whole" project, which would be the heads of the branches and trunk, and I'd be able to list the tags for quick reference for diffing.

Subversion makes all this annoying and slow.

Re:Is it really a problem?

Aristotle on 2008-10-19T13:20:46

I might change them, but it’s not very likely

But what’s the failure mode? More than likely, if you do happen to change a constant, you will probably have mysterious bugs that do not obviously point to the culprit. It can potentially take a huge amount of time to track down the issue. The rationale is the same as with using strict, only more polarised. Therefore, for constants, I have gotten into the habit of doing the following:

our $ANSWER; BEGIN { *ANSWER = \42 }

Trying to mutate $ANSWER or assign to it will throw a “Modification of a read-only value attempted” error.

the added benefit of having true encapsulation and permissions in Perl’s OO is usually not worth it.

You know now what you speak of. The problem is that if I write a Foo::Bar subclass of the CPAN module Foo and I add a private method _baz for my purposes, then if Foo is updated and itself ships a _baz method that was not there before, then Foo::Bar objects will break because Foo methods expect to get Foo::_baz when they call $self->_baz, but they get Foo::Bar::_baz instead. This has nothing to do with enforced privacy. I would be happy to let anyone call my Foo::Bar::_baz if they were actually looking for it. So I want to be able to say “only dispatch here if it’s me asking for it or someone else comes looking for it.” Therefore, I have gotten into the habit of replacing the following:

sub _baz { ... }
sub quux { ... ; $self->_baz( ... ); ... }

with the following:

our $baz; BEGIN { $baz = sub { ... } }
sub quux { ...; $self->$baz( ... ); ... }

Re:Is it really a problem?

Aristotle on 2008-10-19T12:58:10

you can always revert it

Unless you’ve accidentally committed the changes (which is hardly unlikely in a scenario where you’ve changed the tag’s files while thinking you were in some other directory), in which case Subversion likes to make your life as miserable as possible in trying to get that commit to go away.

What I don't understand about svn tags

Alias on 2008-10-15T10:36:29

Since svn doesn't REALLY have tags or branches (it just has copies) what really confuses me about tags is that svn DOES actually have a completely suitable methodology it should be able to use.

Every combination of a URI path and a repository version number is unique.

So combine the two and could probably abuse the URI spec to do this.

http://svn.ali.as/cpan/trunk/Config-Tiny#1234

Lets call it a "pseudo tag" or ptag for short.

Given we have nice immutable paths, why not then just make a simple text file in the root of the svn repository that contained the tags as aliases to the immutable URIs.

product-beta-1 http://svn.ali.as/cpan/trunk/Config-Tiny#1234
product-beta-2 http://svn.ali.as/cpan/trunk/Config-Tiny#5312
product-release http://svn.ali.as/cpan/release-branches/Config-Tiny-1.0#6412
product-hotfix-1 http://svn.ali.as/cpan/release-branches/Config-Tiny-1.0#6419

Re:What I don't understand about svn tags

hdp on 2008-10-15T17:25:05

svn has a syntax for this already too!

product-beta-1 http://svn.ali.as/cpan/trunk/Config-Tiny@1234

product-beta-2 http://svn.ali.as/cpan/trunk/Config-Tiny@5312

product-release http://svn.ali.as/cpan/release-branches/Config-Tiny-1.0@6412

product-hotfix-1 http://svn.ali.as/cpan/release-branches/Config-Tiny-1.0@6419

the svn client already understands these; anything that needed to understand
the '#rev' syntax would have to do the same work anyway.

just a versioned filesystem

Eric Wilhelm on 2008-10-16T03:47:32

Where subversion works well is in providing a centralized versioned filesystem. For many people, that is enough to allow it to be used as a source code repository given a couple conventions (i.e. "trunk, branches, tags" and "don't mess with the tags".)

Having never been a CVS user, I find svn to be much more approachable and never felt like the tags weren't taggy enough.

Now, for merging and distributed revision control, git certainly has lots of benefits. But I think it also loses the ease-of-use of svn for browsability and quick checkouts -- at least in the few cases when I've needed to use it. Perhaps there is a way to fetch only a part of the history?

I would really like to have a mode for "just let me fetch your current code", possibly even using the svn client. Maybe that's easier than I think, but it seems that whenever someone talks about git they start lecturing about the thousands of ways to merge and I end up downloading a tarball and saving versions with `cp`.

I don't want a lecture -- I'm not having trouble understanding Git for Computer Scientists, but I still haven't seen a "git for those who are just passing through".

Re:just a versioned filesystem

rjbs on 2008-10-16T11:45:34

I totally agree that people explain git unproductively. I've written about that before. It's definitely possible to say "just give me the trunk head," because gitweb lets you click to get a tgz of it. I don't know what the corresponding git command is, or if (maybe) it isn't exposed with one -- but it is clearly possible and easy. That would be the equivalent to svn export, though, not svn checkout.

Re:just a versioned filesystem

Aristotle on 2008-10-19T13:16:32

git archive

gc

Dom2 on 2008-10-16T14:36:05

Don't forget to run git gc after git svn clone. That's reduced space significantly for me in a few cases, and sped up git to boot.

Re:gc

rjbs on 2008-10-16T15:39:29

Good catch. That shaves about 35 MB off my git repo. Of course, the space consumed by a git repo was already pretty reasonable...