Java gravel of the day: substring redux

jdavidb on 2009-05-19T16:25:37

On 2008-01-25 I reported that Java's substring() method was brain damaged. On 2008-02-22, I reported that apparently I had been brain-damaged and that Java seemed to be fine and work in a sane manner.

Today I'm looking at the documentation, and I've coded a test implementation, and as near as I can see the brain-damage that I reported really is present. Give Java a second index that goes beyond the length of the String, and instead of just getting the remainder of the String regardless of its length, you get a RuntimeException. If you want to say "Give me 9 characters, or however many you've got," you get to do some math on that yourself. I hope you don't commit a fencepost error.

Here's my implementation that demonstrates this:

public class Substr {
  private String str;

  public Substr(String str) {
    this.str = str;
  }

  public String substr(int beginIndex, int endIndex) {
    return str.substring(beginIndex, endIndex);
  }

  public static void main(String[] args) {
    String str = args[0];
    int beginIndex = Integer.valueOf(args[1]);
    int endIndex = Integer.valueOf(args[2]);
    Substr substr = new Substr(str);
    System.out.println(substr.substr(beginIndex, endIndex));
  }
}

Run results look like this:

$ java Substr ABCDE 0 1
A
$ java Substr ABCDE 0 2
AB
java SubStr ABCDE 0 9
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 9
        at java.lang.String.substring(String.java:1765)
        at Substr.substr(Substr.java:9)
        at Substr.main(Substr.java:17)

I really just don't think that this is good design. But maybe I'm prejudiced. When I'm coding in something besides Perl, I don't expect the benefit of things like negative indices, but I do expect some sanity. To my way of thinking, substr(pos, len), returning up to len characters but not necessarily that many if they are not available, is a pretty common way for languages to specify this call, and is objectively better than substr(start, end + 1) or substr(start, end), and definitely better than throwing an exception for a length too long. But is this really objectively so?

Submitted the question for discussion on Stack Overflow, so we'll see how that goes.

I looked at the source for Java 5's String class and was impressed with the efficiency of the implementation of substring(). Internally a String keeps a char[] array and an offset telling where the String starts in that array; a private constructor permits substring() to return a new String that shares the char array with a new offset and length, which seems quite efficient and clever, to me. But I can see that a sane substr() would have been quite easy to implement using the same constructor; in fact, a new substr() could be grafted on with Perl's semantics (probably including negative indices) with hardly any trouble at all and no conflict with the existing substring() since the name is different.

I have only a slight idea as to why I rescinded my initial false report: I may have been looking at the other signature for substring(), which takes only one index and returns characters through to the end of the String. That's good behavior, as far as I'm concerned; I'd hate to have to work around it not being there.


Yes, you need an external library

drayneck on 2009-05-19T19:35:07

Yes, Java's substring is sub-optimal. You have to grab a library like Apache's Commons Lang to pick up a class called StringUtils, which has a substring method that works as you hope, plus it supports negative subscripting.

The StringUtils class contains a bunch of other methods that make working with Strings less painful as well.