Ripping the Perl Perforce repository

brian_d_foy on 2007-09-22T05:19:00

Today I had my first successful run of git-p4raw, a program I have been working on recently that imports a Perforce repository from its raw back-end data files.

The program is actually quite simple, and its approach was inspired by a conversation between myself and Gurusamy Sarathy. Perforce keeps these "checkpoint" and "journal" files which have inside them all of the metadata that it tracks. The file content itself is stored in RCS files which are otherwise uninteresting. So, the program loads this data into tables, throws a few constraints on the tables to confirm my suspicions of how the schema works, and then sets about mining changesets from the tables. As there is MD5 information in the database (for most files, anyway), integrity can be assured along the way and as a result I am very confident that this will be the cleanest conversion yet.

I still have some work to do:

  1. It might be possible to represent some of the branch cross-merges in the history as real merges in git.
  2. the 5.004 to 5.004_04 series (Tim Bunce's early maintenance work) is currently represented in Perforce as only 5 changes. I'd like to expand that as I have done with the other pre-perforce series
  3. Many changes have embedded author attribution that should be copied into the git "author" field of the commits, so that OHLOH etc has the best information.
  4. Final rewriting to glue the old history onto the new.

I look forward to announcing the completion of this work! It's been a long, hard slog! :)


Goals

kwilliams on 2007-09-23T13:12:59

Is there a summary of your goals somewhere? I'm wondering whether the main point is to convert the perl repo, or to produce a general p4->git tool and the perl repo is just a test case.

  -Ken

Re:Goals

mugwumpjism on 2007-09-24T11:23:30

Originally I was focused entirely on the Perl repository conversion. However I decided for this re-write to keep the pure perforce export stuff separate from subsequent Perl-specific rewriting.

So, the tool should end up generally useful for other people - but doesn't try to do anything with various Perforce features like client specs (important for some people) and its branch auto-detection and creation algorithm may not be appropriate for all repository layouts.