Lingua::Identify

cog on 2004-07-28T10:50:59

I better get on with this, or else it will never be finished :-)

I think I've made up my mind in regard to this:

Lingua::Identify
Lingua::Identify::Standard::Words
Lingua::Identify::Standard::Ngrams
Lingua::Identify::Standard::...

That will hold, respectively, the main program, the information on words, on ngrams, etc.

It also leaves room for something such as

Lingua::Identify::Extended::FI
Lingua::Identify::Extended::KL

for new languages taught by the user :-)

I guess I'll do it this way :-)


Hmms.. I would change...

ambs on 2004-07-28T12:54:46

Lingua::Identify::EN
Lingua::Identify::PT
where the EN and PT modules include words, ngrams and such. New languages would appear as
Lingua::Identify::FI
and so, following the same idea as the builtin languages.

This is my quick reflection about this issue.

Re:Hmms.. I would change...

cog on 2004-07-28T13:07:45

My approach was so that the user wouldn't get 20 modules by default :-|

Re:Hmms.. I would change...

ambs on 2004-07-28T13:09:24

True, but in the other hand, this way you sepparate completely languages. That can give some benefit, I think.

Re:Hmms.. I would change...

cog on 2004-07-28T13:10:11

Not when we talk about maintaining those modules... :-)

Re:Hmms.. I would change...

ambs on 2004-07-28T13:11:59

The idea is to have in those modules ONLY the language information. Those information you will need to maintain. In one, two or hundred modules that information will be allways the same. The real identification code should be generic and use any of the other modules data.

Re:Hmms.. I would change...

cog on 2004-07-28T13:14:35

No, no, no.

Suppose I only have words and ngrams, and now I want to add prefixes... Instead of adding a new module, this way I'm going to have to dwell through 20 something different ones :-| or even more, in the future! :-|

Re:Hmms.. I would change...

ambs on 2004-07-28T13:16:14

Lingua::Identify::EN::ngrams
Lingua::Identify::PT::words
...
??

Re:Hmms.. I would change...

cog on 2004-07-28T13:17:34

20 * 5, for instance, makes 100 modules... is that worth it? :-|

Re:Hmms.. I would change...

ambs on 2004-07-28T13:21:02

Not 5... you don't have them all for all languages. Not 20, you don't have all those languages. But if you do, why not? They are small modules, quite well organized in the directory hiearchy, and it will be very easy to a new user/hacker to look at it and find what he/she/it is looking for.

Re:Hmms.. I would change...

cog on 2004-07-28T13:25:06

  Not 5... you don't have them all for all languages.

Sure I do... For every single language in the tool, I have that information (words, ngrams, prefixes, sufixes, etc).

  Not 20, you don't have all those languages.

But I will...

  But if you do, why not? They are small modules, quite well organized in the directory hiearchy, and it will be very easy to a new user/hacker to look at it and find what he/she/it is looking for.

That's true... but changes will be harder this way...

Well, I guess I can always pray for changing ideas not to ocurr to me O:-)