Dear All,
So a while back, I noted that when I referred to the XML spec (still then in its 1.0 version), I only ever referred to the syntax parts, and I rarely if ever (re-)read the paragraphs.
So I made this, which is basically just the XML spec minus all the paragraphs, leaving behind just the headings and syntax parts. But over time, 1) I got more annoyed by the irony of the nastiness of the 1997-vintage HTML (tables inside tables, etc); and 2) I got tired of looking up the production for Name, and ending up getting to the BaseChar production and seeing rows and rows of goo like "[#x03A3-#x03CE] | [#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3]".
So I did a bit of search-and-replacing to get this prettied-up version of it all. Note that the funny characters (half of which are likely not to show up on any given person's display) have some passably informative title="..." attributes set, which appears when you mouseover.
At some point I may update this for the XML 1.1 spec. But feh (ف), I'm oldskool.
Re:Unicode character ranges
bart on 2005-08-07T18:43:54
I mean [#x80-#x9F], of course... damned hexadecimal.Re:Unicode character ranges
TorgoX on 2005-08-19T06:54:49
Ya know, I was actually wondering about that too. The XML 1.1 spec somewhat clarifies this.But regardless, the strategy I've settled on for escaping arbitrary 8-bit data, is that anything fishy, I move up into the E000-E0FF block, as in my ps2xml library.