I guesstimated that it would take me 2 solid months (with interruptions) to rewrite 250,000 lines of really bad Perl code (with lots of embedded SQL and HTML) into something small, manageable and maintainable. Lucky for me that the rough count via wc was for the entire package, not the back end system I'm concerning myself with. It turns out that it's using roughly 50,000 LOC to do something reasonably trivial (extract a complex schema from a database and emit SGML).
As best as I can count, there are only 60 or so of over 200 inhouse modules used in this part of the project. That number was found by removing all use lib statements (Yes, Virginia, there are use lib statements peppered about the code...), mucking with @INC to not include any non-standard directories, and continually running perl -c. I copied one of these modules at a time into the current directory until the main program passed the syntax checker: 60ish modules found (from memory; I'm not at the office at the moment).
Two of the many factors that allowed this code to get into this state are copy-and-paste programming and poor idiomatic usage. No surprises there. What was an immense boon to the project were the supposedly "generic" modules that were required to get this program to compile: things like CGI, modules for web-based user authentication and HTML generation are to be included in that 50KLOC figure, and completely unnecessary for my purposes.
For a project like this, PerlTidy is an incredible help. I find the native formatting style to be abysmal, but that's purely a matter of taste; nothing substantive wrong here. Running all the code through PerlTidy to produce a single, consistent style has been a great help. (I also told PerlTidy to remove all comments and POD; that removed around 10KLOC from examination. Woo Hoo!)
The project is turning the corner now. I've been in the zone for the the past two weeks, to the point where I come home at night completely drained. Today, I started the familiar sprint to the end, where it's a simple case of iterative development and refinement until the new system agrees with the old system. Three weeks into the project, and it looks like I should have a new foundation and a completed prototype by the end of the week (barring major complications).
That's going to be a great help, since I'm not sure I can get the XSLT stylesheets done to my satisfaction within the one month I promised. I thought that generating PDF would be enough, but it looks like RTF generation is going to be a requirement as well...