I've already wrote that some time ago, forgot where, probably p5p, but got no responses.
Today I tried it again on irc #p5p and ended writing a simple statistic script and the beginning of the OPLINES branch.
The patch is working but the tests results look irreal. Uploaded to http://rurban.xarch.at/software/perl/oplines1.tar.gz>
My 1st testscript is poor and fails mostly (, but I will fix it and run over more files to get better stats.
My 2nd is better:
#! perl
=pod
=head1 NAME
oplines - ops per line + nextstate win stats for TRY_OPLINES patch
=head1 SYNOPSIS
oplines.pl
=head1 DESCRIPTION
Theory:
Move cop_line from COP to BASEOP, and reduce the need for nextstate
ops, which will be an overall win in memory and speed for typical
undense code, less than 4 ops per line.
A cop has 5 ptrs more than a BASEOP, so the memory win will be like
following:
4 ops/line avg.
90% nextstate COP win per lines
=> on 32bit: 4*4=16 byte per line. for 10k src => 200-160k=40k memory win.
+ 4k runtime win (need less nextstate cops)
on 64bit you try.
The unknown factors:
a) typical # of ops per line
b) nextstate win:
* typical # of nextstate cops per 1000-line file.
minus # of really needed nextstate cops (lexcops) per 1000-line file.
2008-09-21 21:41:06 rurban
=cut
use Config;
use lib ".";
my ($sumfiles, $sumlines, $sumops, $sumnextstates, $sumlexstates);
my ($files, $lines, $ops, $nextstates, $lexstates);
open PM, "> B_Stats.pm";
while () { print PM "$_"; };
close PM;
for my $file (@ARGV) {
$s = `$^X -c -MB_Stats $file`;
my @s = split /\t/, $s;
if (@s > 4 and $s[0] =~ /\d+/) {
($files, $lines, $ops, $nextstates, $lexstates) = @s;
$sumfiles += $files;
$sumlines += $lines;
$sumops += $ops;
$sumnextstates += $nextstates;
$sumlexstates += $lexstates;
}
}
print "files: $sumfiles\n";
print "lines: $sumlines\n";
print "ops: $sumops\n";
my $opsratio = $sumlines ? $sumops/$sumlines : 0;
my $copratio = $sumnextstates/($sumfiles+$sumlexstates);
print "ops/line: ",sprintf("%0.2f",$opsratio),"\n";
print "cops: ",sprintf("%0.2f%",$copratio), " (lex+filecops=",
$sumfiles+$sumlexstates," / nextstates=$sumnextstates)\n";
my $runtimewin = $sumnextstates - ($sumfiles+$sumlexstates);
my $opsize = 3*$Config{ptrsize}+4+$Config{intsize};
my $copsize = $opsize + 4*$Config{ptrsize} + 8;
my $memwin = ($Config{ptrsize} * $copsize * $runtimewin) # win the cops
- ($sumops * $Config{intsize}); # minus the added line_t cop_line
print "memory win: $memwin byte (",
($Config{ptrsize} * $copsize * $runtimewin)," - ",($sumops * $Config{intsize}),")\n";
print "runtime win: $runtimewin ops ",sprintf("%0.2f%",($runtimewin*100/$sumops))," ($sumnextstates - ",$sumfiles+$sumlexstates,") \n";
__DATA__
use B::Utils qw(walkallops_simple);
use B qw(OPf_PARENS);
my ($files, $lines, $ops, $nextstates, $lexstates);
sub count_ops {
my $op = shift;
$ops++; # count also null ops
if ($op->isa('B::COP')) {
$nextstates++;
$lexstates++ if ($$op and (($op->flags != 1)
or $op->label));
}
}
CHECK {
($files, $lines, $ops, $nextstates, $lexstates) = (0,0,0,0,0);
($oldfile, $oldlines) = ("",0);
walkallops_simple(\&count_ops);
$files = scalar keys %INC;
for (values %INC) {
open IN, "<", "$_"; while () { $lines++; }; close IN;
}
print "$files\t$lines\t$ops\t$nextstates\t$lexstates\n";
}
1;
So how is the practice?
As it looks like an avg sample has 0.5-2 ops/line (but pod needs to be skipped), and about 100-200 times more pure linecops than really needed cops (block entry, new files). So the win seems to be dramatic (8% speed win for the op traversal). But I have to inspect the cops more firmly now.
./oplines *.pl
files: 245
lines: 85756
ops: 59263
ops/line: 0.69
cops: 11.23% (lex+filecops=494 / nextstates=5549)
memory win: 652628 byte (889680 - 237052)
runtime win: 5055 ops 8.53% (5549 - 494)
$ ./oplines.pl $(find lib ext -name \*.pm)
files: 12051
lines: 5042732
ops: 6629648
ops/line: 1.31
cops: 39.15% (lex+filecops=15334 / nextstates=600266)
memory win: 76429440 byte (102948032 - 26518592)
runtime win: 584932 ops 8.82% (600266 - 15334)
IMPLEMENTATION
The parser emits lots of nextstate cops to track linenumbers: These cops can be omitted and the lines stored in the current op PL_op, not PL_curcop anymore.
The parser can be simplified a lot for the PL_curcop cases.
Also find_cops() is not needed anymore in most cases, where just the line # is needed, but it is still needed for getting the current filename.
Relative speed win?
Alias on 2008-09-22T00:53:00
What sort of relative wall-clock speed win is this going to result in for typical non-io-bound code?
0.01% ?
0.1% ?
1% ?
Re:Relative speed win?
chromatic on 2008-09-22T06:37:14
It's ten percent fewer ops, but the nextstate
op is reasonably slim on its own -- about the same size as the and
and or
ops, slightly smaller than enter
and leave
, and significantly smaller than everything else in pp_hot.c. By rough estimate, without accounting for memory use, cache flushing, and other external features, a 3% improvement would thrill me.
corehackers
chromatic on 2009-07-17T07:17:23
Someone put this on the corehackers wiki; any interest in creating a GitHub branch to continue this work?
Re:corehackers
rurban on 2010-08-28T12:16:10
I believe it was in my github perl as oplines branch, but got lost.
Only my local-op branch is there.