Dear Log,
I needed a corpus of HTML files for testing the HTML::Formatter classes (which I'm tidying up). So I logged into a friend's .edu account, and with a few little Unix commands (including a one-liner involving the king of them all, Perl), I made a tar file of all the reasonably-sized ~user/public_html/index.html files on the system. All 2,908 of them. Excellent!
On the mp3-trola: Grieg, "Cradle Song"
So I'm trying my hand at writing an HTML::FormatRTF to go along with HTML::FormatPS and HTML::FormatText in the HTML-Format(ter?) dist. While writing pod2rtf was pretty straightforward, this is proving pretty tricky -- because the block-level content-models for HTML are so much trickier than the block-level content-models for Pod. If I could wave a magic wand and make all the HTML input be XHTML Strict, that'd be really handy. But for the moment, there's the problems like "<blockquote>foo<p>bar</blockquote>" parsing as:
Late news: For HTML::FormatRTF, I'm giving up on the approach that was meshing so badly with the HTML content-models, and doing something a bit stranger, sort of the way HTML::FormatPS does it.
Re:Content Supermodels
ziggy on 2002-10-25T21:43:22
Have you tried:If I could wave a magic wand and make all the HTML input be XHTML Strict, that'd be really handy.tidy -m -asxhtml foo.html