Perl syntax for numbers

ChrisDolan on 2006-10-03T07:16:27

I've been working on implementing $token->literal() support for PPI::Token::Number subclasses. This method takes a string representing a number and tries to parse it like Perl does. Thus:

  '-.00_1' -> -0.001


Egads, Perl has a lot of numeric formats with a lot of special parsing! Tonight figuring out where Perl allows underscores in numbers is driving me a little batty. I've got PPI's tests working with valid underscore placement, but I need to write some tests for invalid underscore placement. For example:

  1_000          # valid
  1__000         # valid, but warns
  100_           # valid, but warns
  1_000.000_001  # valid
  1_.000         # valid, but warns
  1._000         # valid, but warns
  0xdead_beef    # valid
  0_xdeadbeef    # syntax error
  0_755          # syntax error
  0b1010_1010    # valid
  0b_10          # valid, but warns
  0_b10          # syntax error
  1e1_0          # valid
  1e_10          # valid, but warns
  1e_-10         # valid, but warns
  6_0.6_0.6_0    # valid


looks_like_number()

rafael on 2006-10-03T09:37:36

You might want to look at the perl API function looks_like_number(), accessible from XS. (see perldoc perlapi.)

Re:looks_like_number()

Aristotle on 2006-10-03T11:25:41

Or from Scalar::Util.

Re:looks_like_number()

ChrisDolan on 2006-10-03T13:24:18

Aristotle, Rafael,

Much appreciated! I was not aware of that function. In this particular case, PPI is designed to be more lenient than Perl (it must be round-trip-safe even on invalid syntax) so we'll stick to our custom tokenizing. That said, I'll probably look deeply at looks_like_number to see if I can find inconsistencies in our tokenizer.

Re:looks_like_number()

ChrisDolan on 2006-10-03T13:52:01

Hmm, I just looked at looks_like_number() in Scalar::Util and it's a different beast entirely. That's used for numification, not tokenization. looks_like_number() does not support '_', '[eE]', binary/octal/hex numbers nor version strings. The internal grok_number() in numeric.c is similarly limited.

Instead, I've discovered scan_num() in toke.c. That's what I want to emulate (leniently).

5.6

jjore on 2006-10-03T19:22:18

Prior to 5.6 the placement of _ was restricted something like every third character. Now it's completely open. Isn't it? I think so. Probably.

Underscore

bart on 2006-10-03T22:04:43

As a summary, am I right in interpreting that an underscore is only accepted, without warning, if there is a digit (in the current base) on either side? And no, the "0" prefix in "0x" or "0b" doesn't count. Ditto for the leading "0" for octal.

Re:Underscore

ChrisDolan on 2006-10-03T22:11:24

Yes, that is my conclusion as well.