Remember the early days of XML? Remember how widespread adoption was key, and leveraging (ugh) the existing SGML infrastructure was important? Even in those areas where SGML tools seemed like a good bet (like converting XML to print), they really weren't as great as the apologists made out.
Duh.
It took forever to get XSL-FO out the door, and even now the tool support isn't that great (except thorough TeX; Thank You Sebastian!). Which leaves us with the old model DSSSL stylesheet to produce print (typically RTF), or generating PostScript, PDF, or some TeX variant on your own (no thanks!).
I've worked with enough corner cases to say that DSSSL was a good average case solution, if you knew enough Scheme and DSSSL to be dangerous. But the flow object model supported by DSSSL and the quirks of the [Open]Jade back end lead me to say that this path was never as glorious as the apologists promised. Case in point: keep-with-next-paragraph support.
This week I had to craft a rather complicated output from a rather large document. Margins; gutters; colors (no big deal); tables, and lots of keeps. Either I don't understand what the rather dense DSSSL spec is telling me, or [Open]Jade doesn't really do what the spec says it should do (doubtful), or the [Open]Jade RTF backend doesn't manage keeps and other such properties very well. I'm going with option number 3 here (based on some other quirks I've grown to know and love...).
Eventually, there comes a time when you have to accept that DSSSL or OpenJade or the RTF backend cannot produce what you expect; and it's time to remediate the problem. The classic solution is to fire up Word as an OLE server, load the document and poke around to get the formatting. Except when you have to remediate every single paragraph!
After lots of trial and error, I can say conclusively that the best tools for RTF remediation are (1) MS Word (to see what you should have generated in the first place), (2) diff, (3) Perl, and (4) a vague clue about RTF syntax.
Shoot me for not seeing this sooner.
Here's "Hello World" as generated by wordpad:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl
{\f0\fnil\fcharset0 Times New Roman;}}
\viewkind4\uc1\pard\f0\fs20 Hello World\par
}
And as generated by MSWord 2000;
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f27\froman\fcharset0\fprq2{\*\panose 02040502050405020303}Georgia;} {\f29\froman\fcharset238\fprq2 Times New Roman CE;}{\f30\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f32\froman\fcharset161\fprq2 Times New Roman Greek;}{\f33\froman\fcharset162\fprq2 Times New Roman Tur;} {\f34\froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\f35\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f36\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f245\froman\fcharset238\fprq2 Georgia CE;} {\f246\froman\fcharset204\fprq2 Georgia Cyr;}{\f248\froman\fcharset161\fprq2 Georgia Greek;}{\f249\froman\fcharset162\fprq2 Georgia Tur;}{\f252\froman\fcharset186\fprq2 Georgia Baltic;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255; \red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green 0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\re d0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\b lue0; \red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\styl esheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f27\fs22\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}}{\info{\title Hello World}{\author Sean M. Burke}{\operator Sean M. Burke}{\creatim\yr2002\mo2\dy22\hr21\min26}{\revtim\yr2002\mo2\dy22\hr21\min26}{ \version1}{\edmins0}{\nofpages1}{\nofwords0}{\nofchars0} {\*\company }{\nofcharsws0}{\vern8247}}\widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrl spc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180\d ghorigin1800\dgvorigin1440\dghshow1\dgvshow1 \jexpand\viewkind1\viewscale100\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htm autsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule \fet0\sectd \linex0\endnhere\sectlinegrid360\sectdefaultcl {\*\pnseclvl1 \pnucrm\pnstart1\pnindent720\pnhang{\pntxta.}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5 \pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f27\fs22\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {Hello World \par }}
Makes ya think, don't it!?
Re:RTF
ziggy on 2002-02-23T16:57:45
Have you read the most recent RTF spec? It reads like Microsoft is saying "this is how you should model your word processor internally, because that's how we did it; furthermore, you should understand how we built word, because that will be a valid expression of RTF. Oh, and we'll deign to read whatevery you generate, if it somehow matches this specification, but don't expect us to go through hoops..."The output that I generated with OpenJade with a 22MB SGML source file produced a 4MB RTF output file +/- a few KB after munging. After opening that file and repaginating it, the filesize ballooned to over 10MB - a 150% increase in size!
Re:RTF
TorgoX on 2002-02-23T17:30:19
Yes, RTF can have no pretense of being an "open file format". The only things separating it from.doc are 1) that it doesn't convey macros and 2) that Microsoft didn't actively try to make it totally obfuscated and encrypted.