Plans

mdxi on 2003-11-15T06:40:39

Like some other people here, I'm planning to use this more as a development diary /cum/ announcement board for my perl-related projects (which is to say: all of them) than as a general-purpose online journal.

This week I ceased to be among the unemployed and joined the ranks of the "underemployed". Today at my new crap job, I came up with what I think will be a good DB schema for a project I'm working on with my wife.

She's in her last year of university,majoring in Japanese and minoring in Chinese. Since there's a good deal of overlap between those languages in various ways, she's been maintaining a sort of concordance for herself. I won't bore you with the linguistic details, but the document (for hysterical raisins) is an AppleWorks spreadsheet containing 4 character sets and as many text encodings. It's actually in a bastardized kind of proto-Unicode called Apple International Text, which is actually nothing at all like Unicode. Structurally, it looks like an MPEG file, all the frames of which contain textual data of various encodings.

Long story short: she's finally figured out a toolchain to export this thing into a tab-separated UTF-8 file so that I can parse it up and shove it into a RDBMS, making it usable, searchable, and extensible. Spreadsheets and great financial analysis tools, but they were NOT meant as aids to lexicography.

Anyway, that's my project for the weekend. And let this be a lesson to all of us English-only people: Unicode Everywhere is a very worthy goal. If Perl and Postgres didn't both have unicode internals, this would be a nightmare. As it is, I'm thinking it'll be a pretty straightforward job.

Except maybe for the auto-discriminating "what are you searching for?" feature I'm planning on hacking in :)