Java gravel of the day: regular expressions

jdavidb on 2008-11-11T16:29:45

I knew that Java's regular expressions would come with a horrible syntax I wouldn't want to use. I didn't know that the syntax would appear to be designed by someone who completely misses the point.

In order to use a precompiled regex pattern, you get it to produce a Matcher object ... which is constructed with the String you want to match against! Then you call matches() on the Matcher object.

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

For what mind numbing purpose was it decided that a Matcher object should have a String to match against as a piece of instance data? Are you going to construct one of these Matchers and pass it around so that different pieces of code can check over and over again if the same String matches?

This could have simply been implemented as Pattern.matches(String), or, if you simply must have a Matcher class, then get your Matcher from Pattern.matcher(), taking no parameters, and then require a String parameter to Matcher.matches(String).

What possible reason could there be for not following either of those approaches, instead of what they gave me?

As a sop, you get a static Pattern.matches(String pattern, String) method, but that completely prevents you from using precompiled patterns.

Clueless, Sun. Just clueless. Or is it the Java community that I have to blame for this? I've seen so many features elegantly added into Java, and then I see stuff like this.


The design isn't THAT bad

btilly on 2008-11-11T22:00:18

First of all there is a convenience method of matches for a Pattern.

boolean b = Pattern.matches("a*b", "aaaaaab");

That suffices for the common case. (Too bad they didn't think of allowing matches to pass in flags to compile with.)

Java allows you to reuse the same pattern over and over again. It is more verbose than it would be if they had a utility function of matches on a Pattern, but the functionality is there. Plus there is always the possibility that Java will be smart enough at some point to automatically optimize the naive code for you either at the compilation or the JIT level. (I don't know if that is built yet. But it seems doable. And that is in my eyes a better solution than extending the API more.)

Anyways the real point of the Matcher object is to allow people to loop over a match. In Perl you could write:

while ($string =~ /(\w+)/g) {
    my $word = $1; ...
}

In Java you would do that with something like this:

Pattern find_word = Pattern.compile("\\w+");
Matcher m = find_word.matcher(string);
while (m.find()) { // m.start and m.end tell me where the match is.
}

Which is a reasonable solution IMO.

Re:The design isn't THAT bad

jdavidb on 2008-11-13T14:42:31

That does seem to show a use case for the Matcher object. For the record, I almost never loop over a match in Perl. I just don't seem to find it necessary. I know why you'd want to, though, so Java should make that possible.

Interestingly enough, that seems to be the only explictly java.util.regex-related way to accomplish a s///g, as well, although there's a regex-related method in the String class that will do it.

You just don't get it, do you :-)?

Ron Savage on 2008-11-11T23:47:23

Hi Folks

Java is designed to cripple your hardware. It has no other purpose.

Exactly the same goes for each any every piece of MS software.