Lingua::

rjbs on 2004-04-28T20:18:51

Looking at my lame todo list, I found that due to the order of the letters in the Roman alphabet, Lingua::EO was next on my list. I've done a bunch of work on Games::Board -- which I'll get to later -- so I thought I'd do a bit of work on this.

The point of Lingua::Esperanto is, eventually, to translate Esperanto to English. I'm not looking to translate full sentences, just words. For example, I want to break down "malsanulejo" into "mal-san/a-ul-ej-o" and from there into "place of not healthy people." Presumably there would be an exception, someday, to catch this and call it a hospital. Still, this would be useful even without that. It would certainly help me overcome my stupid problems with -igi- versus -igxi- and other silliness.

So, I started looking for Lingua modules, just to see what the namespace was like, and I found a few things that were scary, a few that were interesting, and a few that were both. The most relevant thing looked to be Lingua::Stem, which seems to be designed to turn a declined word into a stem word. I need to reread the docs, but it seemed like the author assumed that everyone knows what a stemmer does. He doesn't have any concrete "word in, word out" examples or other solid demonstrations. Still, I think I'm making a safe assumption.

I saw another stemmer, and I think that unless I want to use the Ligua::Stem API (which I don't think I do, just yet), Lingua::EO::Stemmer, or something like it, will be born.

Seeing a bunch of stemmers for different languages, I moved on to search for Esperanto modules, guessing I might find something useful. I didn't, really. I saw that Juerd had a bit of code to transcode Esperanto text, fixing h-coding and x-coding. Other than that, I didn't see much related to Esperanto. I did see this, though:: Lingua::Num2Word. It has the (maybe) noble goal of uniting all the number-to-prose modules through one tolerable interface. It has the (unquestionably) failing of being needlessly OO. The constructor is a vanilla bless of an empty hashref, and the methods never use the implementation object. Actually, the code is more complicated because it needs to decide whether or not it's being called as a function or method. (The interface is polymorphic!) In fact, have a look at its C method. If it's being called as a function, it creates a new Lingua::Num2Word object in the lexical scope, does nothing with it, and discards it at the end of the subroutine. Wuh!

So, looking at that drove me to look at some of the other Lingua modules for turning numbers into strings. It's a real hodgepodge. I'd like to find out how well-used Lingua::EN::Numbers is, as it would be nice to replace it with something both simpler and more comprehensive.

And not OO.

Of course, either way, I'll need to write a Lingua::EO::Numbers. I think I'll do that before the stemmer. I don't know whether to write an EO number module as a model for EN or the reverse. Or maybe I should stay the hell away from EN number modules. I don't, after all, feel like descending on the problem en masse like DateTime.