\w in perl 5.11.0

kappa on 2009-10-06T12:53:25

There's an interesting and slightly disturbing change mentioned in delta for 5.11:

The key change here is that \d will no longer match every digit in the unicode standard (there are thousands) nor will \w match every word character in the standard, instead they will match precisely their POSIX or Perl definition.

The interesting part here is that (besides fixing the problems mentioned in the document) \w is going to match a very small and tight set of characters [0-9A-Z_a-z] and may become faster for Unicode strings. The disturbing part is that it stops matching non-ASCII letters which of course includes Cyrillics that I depend on.

It happened that this change is not yet implemented in full — see the node I wrote yesterday on Perlmonks and especially demerphq's comments.

I should make myself comfortable at using \p and \P escapes with Unicode properties just in case. They were implemented almost 10 years ago and I never trusted them enough to start using. That's probably ridiculous.