oplines - win-win memory AND speed

rurban on 2008-09-21T20:10:57

I've already wrote that some time ago, forgot where, probably p5p, but got no responses. Today I tried it again on irc #p5p and ended writing a simple statistic script and the beginning of the OPLINES branch.

The patch is working but the tests results look irreal. Uploaded to http://rurban.xarch.at/software/perl/oplines1.tar.gz>
My 1st testscript is poor and fails mostly (, but I will fix it and run over more files to get better stats. My 2nd is better:

#! perl =pod =head1 NAME oplines - ops per line + nextstate win stats for TRY_OPLINES patch =head1 SYNOPSIS oplines.pl =head1 DESCRIPTION Theory: Move cop_line from COP to BASEOP, and reduce the need for nextstate ops, which will be an overall win in memory and speed for typical undense code, less than 4 ops per line. A cop has 5 ptrs more than a BASEOP, so the memory win will be like following: 4 ops/line avg. 90% nextstate COP win per lines => on 32bit: 4*4=16 byte per line. for 10k src => 200-160k=40k memory win. + 4k runtime win (need less nextstate cops) on 64bit you try. The unknown factors: a) typical # of ops per line b) nextstate win: * typical # of nextstate cops per 1000-line file. minus # of really needed nextstate cops (lexcops) per 1000-line file. 2008-09-21 21:41:06 rurban =cut use Config; use lib "."; my ($sumfiles, $sumlines, $sumops, $sumnextstates, $sumlexstates); my ($files, $lines, $ops, $nextstates, $lexstates); open PM, "> B_Stats.pm"; while () { print PM "$_"; }; close PM; for my $file (@ARGV) { $s = `$^X -c -MB_Stats $file`; my @s = split /\t/, $s; if (@s > 4 and $s[0] =~ /\d+/) { ($files, $lines, $ops, $nextstates, $lexstates) = @s; $sumfiles += $files; $sumlines += $lines; $sumops += $ops; $sumnextstates += $nextstates; $sumlexstates += $lexstates; } } print "files: $sumfiles\n"; print "lines: $sumlines\n"; print "ops: $sumops\n"; my $opsratio = $sumlines ? $sumops/$sumlines : 0; my $copratio = $sumnextstates/($sumfiles+$sumlexstates); print "ops/line: ",sprintf("%0.2f",$opsratio),"\n"; print "cops: ",sprintf("%0.2f%",$copratio), " (lex+filecops=", $sumfiles+$sumlexstates," / nextstates=$sumnextstates)\n"; my $runtimewin = $sumnextstates - ($sumfiles+$sumlexstates); my $opsize = 3*$Config{ptrsize}+4+$Config{intsize}; my $copsize = $opsize + 4*$Config{ptrsize} + 8; my $memwin = ($Config{ptrsize} * $copsize * $runtimewin) # win the cops - ($sumops * $Config{intsize}); # minus the added line_t cop_line print "memory win: $memwin byte (", ($Config{ptrsize} * $copsize * $runtimewin)," - ",($sumops * $Config{intsize}),")\n"; print "runtime win: $runtimewin ops ",sprintf("%0.2f%",($runtimewin*100/$sumops))," ($sumnextstates - ",$sumfiles+$sumlexstates,") \n"; __DATA__ use B::Utils qw(walkallops_simple); use B qw(OPf_PARENS); my ($files, $lines, $ops, $nextstates, $lexstates); sub count_ops { my $op = shift; $ops++; # count also null ops if ($op->isa('B::COP')) { $nextstates++; $lexstates++ if ($$op and (($op->flags != 1) or $op->label)); } } CHECK { ($files, $lines, $ops, $nextstates, $lexstates) = (0,0,0,0,0); ($oldfile, $oldlines) = ("",0); walkallops_simple(\&count_ops); $files = scalar keys %INC; for (values %INC) { open IN, "<", "$_"; while () { $lines++; }; close IN; } print "$files\t$lines\t$ops\t$nextstates\t$lexstates\n"; } 1;

So how is the practice?

As it looks like an avg sample has 0.5-2 ops/line (but pod needs to be skipped), and about 100-200 times more pure linecops than really needed cops (block entry, new files). So the win seems to be dramatic (8% speed win for the op traversal). But I have to inspect the cops more firmly now.

./oplines *.pl files: 245 lines: 85756 ops: 59263 ops/line: 0.69 cops: 11.23% (lex+filecops=494 / nextstates=5549) memory win: 652628 byte (889680 - 237052) runtime win: 5055 ops 8.53% (5549 - 494)

$ ./oplines.pl $(find lib ext -name \*.pm) files: 12051 lines: 5042732 ops: 6629648 ops/line: 1.31 cops: 39.15% (lex+filecops=15334 / nextstates=600266) memory win: 76429440 byte (102948032 - 26518592) runtime win: 584932 ops 8.82% (600266 - 15334)

IMPLEMENTATION

The parser emits lots of nextstate cops to track linenumbers: These cops can be omitted and the lines stored in the current op PL_op, not PL_curcop anymore. The parser can be simplified a lot for the PL_curcop cases. Also find_cops() is not needed anymore in most cases, where just the line # is needed, but it is still needed for getting the current filename.

Relative speed win?

Alias on 2008-09-22T00:53:00

What sort of relative wall-clock speed win is this going to result in for typical non-io-bound code?

0.01% ?

0.1% ?

1% ?

Re:Relative speed win?

chromatic on 2008-09-22T06:37:14

It's ten percent fewer ops, but the nextstate op is reasonably slim on its own -- about the same size as the and and or ops, slightly smaller than enter and leave, and significantly smaller than everything else in pp_hot.c. By rough estimate, without accounting for memory use, cache flushing, and other external features, a 3% improvement would thrill me.

corehackers

chromatic on 2009-07-17T07:17:23

Someone put this on the corehackers wiki; any interest in creating a GitHub branch to continue this work?

Re:corehackers

rurban on 2010-08-28T12:16:10

I believe it was in my github perl as oplines branch, but got lost.
Only my local-op branch is there.