Regex for a word

cog on 2004-10-14T10:41:39

How would you define a regex for a word?

No, it wouldn't be [a-z]+, as that would get things such as "z", which I don't think is a word in any language (am I wrong?)

So... do all words in every language contain at least one vowel? I think it would be too simple if it were so, but I can't think of any example...

Any other rules?

Words

rafael on 2004-10-14T10:49:46

In French many single-syllable words can be elided before a wovel, so d', l', j', m', t' are valid words. t is a fake word used to adapt some ligatures (comment va t-il ?). y is an adverb of location (j'y vais), it counts as a semi-vowel.

Re:Words

cog on 2004-10-14T10:52:21
Hmm... this is *very* useful information to what I'm doing :-) Thanks :-)

Anyone else would like to say something about his/her own language? :-) Or any other, for that matter :-)

Vowels

htoug on 2004-10-14T11:30:03
There are several "words" that do not contain what is normally called a vowel.
In english, y is not a vowel (I learnt that in school in Ireland, so I might be wrong), so the word rhythm does not contain a vowel.
I had a jugoslav friend a long time ago, whose last name was Hrs, which he claimed was perfectly pronounceable (somewhat like 'hearse'), but I can't find any vowels there either.

Re:Vowels

domm on 2004-10-14T13:19:14
We were spending our holiday on the island of "Krk" in Coratia.

One of the things I remember from my ancient greek classes in school is that there is a family of consonants called something like "muta cum liquida" which can behave like vowls. I think they are:
r, m, n, l
You can say those consonats for a prolounged period of time ("mmmmmmmmmmmmmmmmmm") just like a vowel ("aaaaaaaaaaaaaa"), which is something you cannot do with 'proper' consonants ("t-t-t-t-t-t-t"). Hence they are called 'con-sonants'.

I guess this doesn't help with your problem, but it's an interesting factoid, IMO.

And probably it's all wrong, too, but IANAL (Linguist)

Easy enough

Juerd on 2004-10-14T12:21:14

use File::Slurp qw(read_file); my @words = read_file '/usr/share/dict/words'; chomp @words; my $regex = join '|', map quotemeta, @words; $regex = qr/$regex/;

Re:Easy enough

cog on 2004-10-14T12:42:30
OK, that was a good answer... what I forgot to say was: I don't know the language nor do I have a list of its words :-)

I'm doing this for something like fifteen different languages.