strange .. or obvious ?

tinman on 2004-04-12T21:38:46

No one has ever done it before ? or it's just that no one is bothered or even simpler, that it doesn't make sense?

I need to search through documents written in Asian scripts. It's part of my language independent, entity extraction thingmajig (to put it technically, heh heh). As I mentioned before, people seem to author content in a variety of custom built true type fonts. I'm just going to construct mapping tables for the more popular fonts, so that I can convert them easily into Unicode. Voila, I don't need to worry about weird ASCII garbage, but I can just run scripts through a converter and expect everything to be readable by any application which understands ONLY Unicode..

I see a couple of problems already, though.

  • There must be a language somewhere which breaks my formatter; one with strange formatting or character modification rules. I simply can't believe a simple hash (or HashMap, depending on the language) job like this hasn't been done before.
  • I wonder if someone is going to complain about having me run their font through charmap and build a mapping table.

    For extra mod/brownie points, I can even build an Editor thingummy which converts between custom fonts and Unicode. Ok, THAT I am certain has been done before. Do these Unicode aware applications also understand custom TrueType fonts ?

    More investigation needed ....