Ripping the Perl Perforce repository

mugwumpjism on 2007-09-22T13:25:51

Today I had my first successful run of git-p4raw, a program I have been working on recently that imports a Perforce repository from its raw back-end data files.

The program is actually quite simple, and its approach was inspired by a conversation between myself and Gurusamy Sarathy. Perforce keeps these "checkpoint" and "journal" files which have inside them all of the metadata that it tracks. The file content itself is stored in RCS files which are otherwise uninteresting. So, the program loads this data into tables, throws a few constraints on the tables to confirm my suspicions of how the schema works, and then sets about mining changesets from the tables. As there is MD5 information in the database (for most files, anyway), integrity can be assured along the way and as a result I am very confident that this will be the cleanest conversion yet.

I still have some work to do:

  1. It might be possible to represent some of the branch cross-merges in the history as real merges in git.
  2. the 5.004 to 5.004_04 series (Tim Bunce's early maintenance work) is currently represented in Perforce as only 5 changes. I'd like to expand that as I have done with the other pre-perforce series
  3. Many changes have embedded author attribution that should be copied into the git "author" field of the commits, so that OHLOH etc has the best information.
  4. Final rewriting to glue the old history onto the new.

I look forward to announcing the completion of this work! It's been a long, hard slog! :)