XML: SAX and Twig
I have been reading about XML::SAX, and it is
starting to make sense. I am concerned about
memory usage as compared to XML::Twig. I need
to process an XML file that is at least 10MB
and possibly as large as 100MB. I would like
to limit the RAM usage to less than 256MB,
although 512MB might be okay.
I have a hard limit of 2GB of RAM,
since I am using 32 bit perl.
It looks like XML::Twig can be set up to work
with SAX, so this might help to solve
the memory problem.
These file sizes are spooky.
I think of XML as
one-big-happy-file-that-describes-a-thing.
Perhaps "the-thing" is too complicated
for a single file.
If so, I will need a new approach. I may need
to learn about namespaces or some other
way to partition a large XML dataset.
I thought up a way to eliminate the redundancy
in the XML reader/writer for my flat/lumpy
files. I can have a data structure that
specifies the flat file in XML.
Redundant portions of the XML reader and writer
can be generated from this file.
It would be nice if someone had already
written this. There are
many tradeoffs in the design of such a thing,
and I don't want to get bogged down in it.
I will look at some of the SAX drivers for
non-XML data sources.
I think removing reader and writer
redundancy will be worthwhile,
since I have at least a dozen and perhaps
thirty of these file formats to translate
to and from XML.
As my buddy Steve says,
"Make things that are the same the same and
things that are different different."
Twiki
One of the things I like about
PerlMonks
is that I get new ideas that have nothing to do with what
I am working on. Today, for example, I
downloaded, built, and ran TWiki. Suddenly I
get it and I hope that I will be using
TWiki for something that will be useful and yet
disruptive. At work there is a large dataset
of free-text startup content,
which is duct-taped to the side
of an exquisitely normalized database.
This text is the output from an extensive
ongoing collaboration. It looks like a great
opportunity for a wiki.
The main challenge will be scalability. I plan on evaluating this within the next few months.
New Modules
I am still trying to get TWiki working for
creating new users. I didn't have any email set
up on the machine where I was running TWiki,
and that seemed to be a problem. I got the email
working, but I still have the same problem.
I rebuilt perl 5.8.0 in the process, and updated
a bunch of modules as recommended by the
results of running r
command in the cpan program.