Tight Coupling of Applications

Ovid on 2009-06-22T12:10:55

At the BBC, we have some mandated technology stacks. While this is almost galling to some people's interpretation of TIMTOWTDI, the reality is that in a large-scale environment, it's very helpful to admins to not have to worry about which VCS, Wiki or other tools teams are using when needing different teams to work in shared environments. Of course, when you get down to lower level stuff, you really don't want to tell a team they can't upgrade DBD::mysql because another team requires an older version, but compromises have to be made. Today, that compromise is becoming painful.

I recently tried to commit some changes to our architecture dependent CPAN modules (for example,modules in deps/lib/perl5/i486-linux-gnu-thread-multi/). However, subversion kept insisting that a bunch of files I was adding already existed in the repository, even though svn ls svn://path/to/archdeps did not list those files. After I and another developer kept digging around, we finally gave up and came up with a work-around. First, I did this:

find . -type d -name '.svn' --exec rm -fr {} \;

I then tarred up the deps directory, made a fresh checkout and untarred the files over the fresh checkout. The commit then worked just fine. Unfortunately, I then found out that somehow this corrupted several XS extensions, forcing me to reinstall them. Luckily this was trivial, but the last time this happened (yes, this has happened before!), I couldn't even run the cpan shell due to corrupted XS dependencies.

Having run into numerous issues with Subversion and having found git to be so pleasant to work with, I thought it was time for us to give up and switch to git, but we can't. Even if we get developer buy-in, we use Trac. Trac is tightly integrated with subversion but not with git. There are git plugins for Trac, but they're not ready for production use. Even if they were, it shouldn't be the case that two developers dealing with a frustration invest a lot of time and money into solving a problem whose solution potentially impacts a number of other teams.

The fact that Trac is so tightly integrated with our subversion use of Subversion is a huge frustration (I don't know enough about the internals of Trac to know if this is our fault or theirs). I'd love to find a better way of dealing with this and I'd be ecstatic to see some strategy for decoupling the two. Unfortunately, it's not going to happen.

There are some great benefits of working with larger organizations, but this ain't one of them.


how?

schelcj on 2009-06-22T12:32:43

how does switching to git from subversion fix this problem? it seems you brute forced a fix rather than finding the root cause.

Re:how?

Ovid on 2009-06-22T13:14:52

These are the sorts of problems we constantly run into with Subversion and trying to narrow down the root causes often involves something along the lines of "subversion is very picky about how you do things and throws a hissy fit when you don't play along." Sometimes it's been a matter of updating subversion, other times it's a bunch of developers throwing up their arms in dismay and giving up finding the actual problem (such as debugging a complicated branch merge which has, once again, gone awry). If this wasn't a frequent problem, we'd probably dig heavier. As it is, we've grown used to subversion problems and we've sort of "given up" on making it do what we want.

With git, I've never had any issues like this. I've not once been bitten by git getting confused over what I've done and this has been the experience of other developers I've chatted with. When you add in all of the benefits of git over subversion, trying to repeatedly debug the root causes of bugs in an inferior technology technology doesn't seem worth the trouble.

Re:how?

Aristotle on 2009-06-22T13:41:34

The thing with git is, at its core lies a very simple and easily understandable data model. Once you understand how that works and how git puts it to work, it’s basically impossible to dig yourself into a hole you can’t get out of.

With Subversion, the internals are a morass of complexity. I wrote a few scripts in my time, eg. to undo broken commits immediately after they were made, but they were trivial in their effects and still involved a lot of cargo cult because the Subversion data model is just such a hairy construction.

With git, I’ve done a bunch of pretty complicated ad hoc munge/extract jobs far beyond anything I’d’ve dared to do with Subversion – and it was easy to build them and easy to make them work right, working from first principles. In some cases I later ran across articles and posts explaining better ways of doing what I did; but I got the job done anyway, if badly – manipulexity and whipuptitude. And when I found out about better ways, I could immediately see why they were better.

I feel confident with git in a way that was unthinkable with Subversion. Sometimes it isn’t as much help as I might like, but I never feel I have to fight it.

Re:how?

schelcj on 2009-06-22T16:00:24

i think my problem is that i shouldn't have to dig into the internals of the scm regardless how easy/hard it is. if i do then either i did something wrong or the scm did. more often then not i would assume it was me that screwed up.

Re:how?

Aristotle on 2009-06-22T16:24:32

i shouldn’t have to dig into the internals of the scm

I spent 5 minutes trying to figure out how to phrase my sentiment about this. In the end, the best I can do is this: I prefer Unix over Windows.

more often then not i would assume it was me that screwed up.

So in that case, which is better: the VCS whose design is so simple that you can figure out how you dug yourself into the hole, and get back out; or the VCS whose design is so hairily complex that you have no option but to abandon ship?

Having asked that, let me ask next: will the user make mistakes or not?

But wait; it’s not that simple. Let me ask another question: in which system is it more likely that the user will dig themselves into a hole – either through their own fault or through the software’s –: one whose fundamental design is very simple, or one whose fundamental design is complex?

Tony Hoare once said that “There are two ways of constructing a software design…”

Re:how?

schelcj on 2009-06-22T17:30:09

well i've done a piss poor job explaining myself because i agree with everything you have said. i will come back when i can spit out whats in my head clearly. thanks for your time though.

Re:how?

Shlomi Fish on 2009-06-23T07:29:20

With Subversion, the internals are a morass of complexity. I wrote a few scripts in my time, eg. to undo broken commits immediately after they were made, but they were trivial in their effects and still involved a lot of cargo cult because the Subversion data model is just such a hairy construction.

I didn't find the Subversion internals that complex. I contributed some patches to Subversion back at the time, and found the code easy enough to understand and change. Subversion indeed uses a transactional database for storing the tree and the changesets, but it's not as complex as you describe.

With git, though, I still don't understand how exactly its data model works. After reading this document and asking some people, I can finally handle the day-to-day operation of git, but I still feel it does a lot of voodoo behind the scenes.

Re:how?

Aristotle on 2009-06-23T08:13:02

Git is trivially simple internally.

  • The ID of every distinct stored object is the SHA-1 hash of its contents.
  • File objects (which contain the contents of a file) are stored directly as binary blobs.
  • Tree objects (ie. directories) are simple plain text with each line containing the ID, type and file name of a contained object.
  • Commit objects are plain text; they contain a simple header listing the IDs of the parent commit(s) and the root tree object for that commit.
  • Tag objects are more or less the same as commit objects, though they can point to pretty much any kind of object. (In practice, they only ever point to commit objects.)

That’s it. That’s all. The loose-object repository format can be explored with a text editor and is immediately obvious.

The packed format is trickier, but ultimately it’s just a kind of tarball containing the loose-object format, metaphorically speaking.

The rest of the glue that holds a repository together is in the refs/ subdirectory, which contains various tiny text files containing either SHA-1 IDs, or names of other files within refs/. This is where branches and lightweight tags get their names.

It’s all very much in the spirit of Unix.

For how this model serves as the basis of a coherent system, I recommend that you watch Scott Chacon’s presentation if you haven’t already. His slides are excellent and he does a really good job of explaining it all.

Re:how?

schelcj on 2009-06-22T15:57:39

wouldn't the fact that these are frequent problems point to a problem with the process/procedure of subversion usage?

i still haven't seen how git makes these complicated issues any better(granted i haven't used git enough yet.) if git does correct these problems does that mean the SCM was the problem or the usage pattern of the dev team?

ps: not being snarky, i actually have ran into the same problems with trac/svn. its a love/hate thing with trac for me.

Interesting

Shlomi Fish on 2009-06-23T07:24:10

That's the second hard-to-reproduce bug I heard with Subversion the past few weeks (this one being the first). I've never ran into such problems with Subversion - it just works, so it seems very strange to me. If you can somehow reproduce the problematic behaviour on non-confidential, example repositories than I'm sure the Subversion developers will appreciate a bug report.

Complaining is easy, but actually fixing the problems is more worthwhile in the long run.

Re:Interesting

Ovid on 2009-06-23T07:46:47

Why should I put time and energy into fixing a bad technology that, if I have the chance, I'll never use again? OK, so that's a bit unfair, but frankly, I just don't care about Subversion any more. It's that simple. I know that in future positions I'll likely be working on projects where management/developers are too timid to switch to something better than Subversion, so yes, I should care how well it works, but if I could reproduce this problem in the first place, I might actually know what caused it and simply not do it again.

Subversion isn't broken because it has a few bugs. Subversion is like an old 386 machine I used to work on. It's quaint and I have some lingering fondness for it as it was the had the first GUI I ever used, but I wouldn't want to do serious work on it.