HAHA PHP

xsawyerx on 2009-06-10T09:30:48

We can't stop laughing at this at $work. Apparently PHP decided to add goto control structure!

So you say "okay, on the plus side, at least they remembered a really basic control structure" but then you say "who the hell uses goto in anything that _isn't_ assembly or _perhaps_ condensed C [that isn't a really oldschool person]?!"

Some people mentioned a few things good with GOTOs, and here is my 2 cents on them:

  • Useful for error handling: did any of you heard of structured code? perhaps heard of logging?
  • Useful for writing parsers: I would use some better coding conventions for this like function referencing, dispatch tables and others to write a parser, but not goto. writing parsers with goto is just abusing an arcane control structure for something advanced ones are better used to maintain readability and workflow. besides, who the hell writes parsers in PHP?
  • Can be faster than other control structures: are you freaking kidding me? you're using a DYNAMIC LANGUAGE. you have performance hit by even echoing a hello world, but you're worried about a goto being a bit faster than a nested if()?

Remember when I started by asking "who the hell uses goto in anything that _isn't_ assembly or _perhaps_ condensed C?!"? After asking one of our PHP developers that question, I spit out another question without noticing: "Do they [the PHP developers themselves] have a lot of use for goto in PHP code they write?" and right after I thought "Do they write a lot of goto in C?"

Thus...

$ grep goto -r php-5.2.9 | wc -l
21848

But then I wondered...

$ grep goto -r perl-5.10.0 | wc -l
2513

Here's some more info:

(php has also extensions and a ton-load of documentation)
$ find php-5.2.9 -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' -exec cat {} \; | wc -l
149352
$ find php-5.2.9/win32/ php-5.2.9/Zend/ php-5.2.9/main/ -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' -exec cat {} \; | wc -l
48275
$ find perl-5.10.0 -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' -exec cat {} \; | wc -l
63785
note: SLOC does not equal productivity or quality


Worse is better, obviously

nicholas on 2009-06-10T10:11:16

Sadly goto is a useful construction in C, because the flow control primitives are, well, far more primitive than Perl. For example, you can only break out of the innermost loop.

Perl's goto has something that isn't offered by C or PHP - goto &NAME . That form is really the only form that anyone should be using in Perl code, and then only when they know what they're doing and why. I don't know PHP, but either it has flow control structures as powerful as Perl's, in which case goto is not needed, or the flow control structures themselves should have been enhanced, rather than adding goto. Either way, what they did is daft.

Your stats don't line up. You're grepping all files, not just C code initially, and I don't believe that your find commands do what you expect.

Perl

$ find perl-5.10.0 -name ext -prune -o '(' -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | wc -l
231417
$ find perl-5.10.0 -name ext -prune -o '(' -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | egrep '\bgoto\b' | wc -l
1216
$ find perl-5.10.0 '(' -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' -or -iname '*.xs' ')' -exec cat {} \; | wc -l
293114
$ find perl-5.10.0 '(' -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' -or -iname '*.xs' ')' -exec cat {} \; | egrep '\bgoto\b' | wc -l
1340

PHP

$ find php-5.2.9/win32/ php-5.2.9/Zend/ php-5.2.9/main/ -type f '(' -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | wc -l
120833
$ find php-5.2.9/win32/ php-5.2.9/Zend/ php-5.2.9/main/ -type f '(' -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | egrep '\bgoto\b' | wc -l
201
$ find php-5.2.9 -type f '(' -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; |  wc -l
783305
$ find php-5.2.9 -type f '(' -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | egrep '\bgoto\b'|  wc -l
11699

So most of PHP's C gotos are in extensions, whereas most of Perl's are in the core. But what are all those other files in the PHP habouring gotos?

$ find php-5.2.9 -type f '(' -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -o -print | xargs grep -c goto | grep -v :0
php-5.2.9/Zend/README.ZEND_VM:1
php-5.2.9/Zend/zend_language_scanner.l:1
p hp-5.2.9/Zend/flex.skl:5
php-5.2.9/Zend/zend_vm_gen.php:12
php-5.2.9/Zend/Chan geLog:3
php-5.2.9/ext/pcre/pcrelib/ChangeLog:1
php-5.2.9/ext/pcre/pcrelib/pcre _printint.src:1
php-5.2.9/ext/date/lib/parse_date.c.orig:9309
php-5.2.9/ext/da te/lib/parse_date.re:3
php-5.2.9/ext/ming/tests/swfaction.phpt:1
php-5.2.9/ext /ming/tests/swfaction-new.phpt:1
php-5.2.9/ext/mbstring/oniguruma/HISTORY:1
ph p-5.2.9/ext/standard/url_scanner_ex.c.orig:103
php-5.2.9/ext/standard/url_scann er_ex.re:27
php-5.2.9/ext/standard/var_unserializer.c.orig:185
php-5.2.9/ext/p do/pdo_sql_parser.c.orig:45
php-5.2.9/ext/pdo/pdo_sql_parser.re:12
php-5.2.9/e xt/zlib/tests/bug.tar:187
php-5.2.9/sapi/thttpd/thttpd_patch:2

What are .re and .c.orig files?

Re:Worse is better, obviously

xsawyerx on 2009-06-10T11:56:33

Thanks for providing more accurate (or.. much more accurate) finds

I also liked the explanation you gave on the use of gotos. I had a discussion about that with a colleague (who is a Python programmer) and we got around to labels and I tried to explain that Perl has label-aware loop commands. That you can last to a label. In PHP, you can continue an amount of loops outside (continue 3 would exit 3 loops above) while apparently in Python you can't do either, only break and loop.

I guess what I'm saying is that there's a lot of ways to peel an onion (I know at least two!), and a lot of ways to construct a better workflow in dynamic programming languages, but the language developers have to think that a programmer might want/need it and then provide it. Thus, as you put it (but I'll rephrase), either PHP isn't good enough to overcome goto or they've added something that isn't required for no good reason (since it's not required).

The only time I've personally ever used goto was assembly programming

Re:Worse is better, obviously

xsawyerx on 2009-06-10T11:57:17

Oh, and I have no idea what the other files are. Maybe files that should be cleaned?

A quick plug for ack

petdance on 2009-06-10T15:01:43

For those who don't use ack, this is exactly the sort of thing ack was designed to do easily.

find perl-5.10.0 -name ext -prune -o '(' -type f -iname '*.c' -or -iname '*.cpp' -or -iname '*.h' ')' -exec cat {} \; | egrep '\bgoto\b' | wc -l

is just

ack --cc --cpp -w goto perl-5.10.0 | wc -l

--cc means "C source and headers only", --cpp is "C++ source and headers only", -w means "word only".

Just install CPAN package App::Ack.

Re:A quick plug for ack

xsawyerx on 2009-06-10T15:16:03

*beats head on wall*

Thanks Andy :)

stats - one third of perl C goto is toke

n1vux on 2009-06-10T21:40:44

in the 5.8.9-rc1 that i had handy from a regression test

of 1270 C-ish file hits using Andy's ack invocation

nearly one third are in one file alone
  403 toke.c

most of which and one quarter of the total being exactly
  343 toke.c: goto unknown;

the long tail continues over 65 files, but a mere 5 files cover 50% in 5 a tie
  403 toke.c
    95 regexec.c
    54 pp_sys.c
    53 sv.c
    39 opmini.c
    39 op.c # tied for 5th
(subtotal 683 53%)

and covers 80% at 19 files (17th-21st having 18 each, subtotal 1059 83%)

(That's a pretty good Pareto approximation!)

16 are in comments
17 are in quotes
      including two in comments above
      and 8 are die messages
16 are #define yacc actions in four files

Top 30 goto-bearing Cish files -

  403 toke.c
    95 regexec.c
    54 pp_sys.c
    53 sv.c
    39 opmini.c
    39 op.c
    36 util.c
    36 regcomp.c
    30 os2/OS2/Process/Process.xs
    30 gv.c
    29 pp_hot.c
    28 x2p/walk.c
    27 os2/os2.c
    26 perl.c
    25 doio.c
    19 pp_pack.c
    18 x2p/a2p.c
    18 win32/win32.c
    18 utf8.c
    18 pp_ctl.c
    18 perly.c
    17 vms/vms.c
    16 ext/Storable/Storable.xs
    15 ext/SDBM_File/sdbm/dbe.c
    12 pp.c
    12 dump.c
    11 jpl/JNI/JNI.xs
    10 mg.c
      9 ext/Encode/Encode.xs
      8 doop.c

(30th subtotal 1169 92%)

(ack did find a few .h as well but they had few each.)

Oh my god, the statistics

xsawyerx on 2009-06-11T07:38:42

Very nice, thanks for sharing. :)