I'm looking for reasonable quantities of text in as many languages as I can get my hands on (note: I mean "text in English", "text in French", etc. I do not mean "text with as many languages as possible inside it").
Basically, I'm looking for better training text for my Lingua::Identify project.
If anyone has a couple of pointers (or even the corpus by itself, even if just of one language), I'd really appreciate that :-)
Oh, one other thing: by "reasonable", I think I'm aiming for something like 10M... but I'd just like to get my hands on corpus, right now (hey, 1M today, 1M tomorrow...)
Google's advanced search allows you to limit the results to just one particular language; from Arabic to Turkish, you have 35 choices in all.
For example...
Google for Perl in French
Google for Perl in German
More! They are parallel! (you know I like them this way)
http://people.csail.mit.edu/people/koehn/publications/europarl/
ABU make French literary texts (no copyright strings attached) available for free. These are classic works, so maybe the French is a little too classic for your needs. There's a lot of poetry in there as well.