a harmless sort?

LTjake on 2005-04-01T16:03:10

tholbroo had mentioned to me that if I'm doing a regular sort on Class::DBI objects, I can save a lot of time by doing a Schwartzian transform. I didn't think that there would be much of a penalty on small sets, but I decided to do some benchmarking (using the CPANTS DB):

package CPANTS::Distro::Kwalitee;

use strict;
use warnings;

use base qw( Class::DBI );

__PACKAGE__->connection( 'dbi:SQLite:dbname=cpants.db' );
__PACKAGE__->table( 'kwalitee' );
__PACKAGE__->columns(
	All => qw(
		distid
		kwalitee
		extractable
		extracts_nicely
		has_buildtool
		has_manifest
		has_meta_yml
		has_proper_version
		has_readme
		has_test_pod
		has_test_pod_coverage
		has_tests
		has_version
		is_prereq
		no_cpants_errors
		no_pod_errors
		no_symlinks
		proper_libs
		use_strict
	)
);

package main;

use strict;
use warnings;

use Benchmark qw( cmpthese );

my @distros = CPANTS::Distro::Kwalitee->retrieve_all;
@distros    = @distros[ 0..99 ];

printf( "Sorting %d rows...\n", scalar @distros );

cmpthese(
	1000,
	{
		regular_sort  => \®ular_sort,
		schwartz_sort => \&schwartz_sort
	}
);

sub regular_sort {
	@distros = sort {
		$a->kwalitee <=> $b->kwalitee
	} @distros;
}

sub schwartz_sort {
	@distros = map {
		$_->[ 0 ]
	} sort {
		$a->[ 1 ] <=> $b->[ 1 ]
	} map {
		[ $_, $_->kwalitee ]
	} @distros;
}

__END__
C:\cdbibench>perl bench.pl
Sorting 100 rows...
               Rate  regular_sort schwartz_sort
regular_sort  102/s            --          -47%
schwartz_sort 195/s           90%            --

Interesting. tholbroo++


That's not surprising

merlyn on 2005-04-01T16:12:14

If the expense of re-computing the sort key exceeds the expense of caching the sort key, then the ST wins. You merely need a "long enough" list and an "expensive enough" sort key computation.

Sort in the database

autarch on 2005-04-01T17:05:56

Sort your data in the database. This will be faster. It also scales better when you just want part of the result set, because you can use LIMIT (or whatever your DB supports).

Class::DBI considered a hinderance

Aristotle on 2005-04-03T00:38:28

Yeah, what autarch said.

This is why gave up Class::DBI – it very nearly forces you to do on the Perl side what should be done at the SQL level because it makes it way too difficult to do the latter. First you run into a wall of no documentation, then you often have to put up with annoying verbiage. After reading the source for 15 minutes, I think the following will do what you need:

CPANTS::Distro::Kwalitee->set_sql(
    all_by_kwalitee => 'SELECT __ESSENTIAL__ FROM __TABLE__ ORDER BY kwalitee',
);
my @distros = CPANTS::Distro::Kwalitee->sth_to_objects( 'all_by_kwalitee' );

Note that sth_to_objects() is not documented anywhere. The code should work, but I cannot tell since I’m not going to spend another 15 minutes setting up all the knobs and handles that testing Class::DBI code requires. If it does not work, the least-effort approach will require more red tape. Compare to plain DBI to get an AoH, which is clearly documented and took me 40 seconds to write:

my $distros = $dbh->selectall_arrayref( 'SELECT * FROM kwalitee ORDER BY kwalitee', { Slice => {} } );

Plain DBI is at times crufty, but at least it doesn’t get in your way. As a bonus, it has much less overhead. Class::DBI is seductive, and will whisper in your ear that you that you don’t need to get your hands dirty with all that SQL, but in practice I’ve found it fails to deliver. Last not least, Class::DBI’s dependencies are legion.

Try Sort::Key

salva on 2005-04-09T16:58:52

Hi,

You can try Sort::Key that I released some days ago on CPAN, its like the Schwartzian transform, but implemented in C so a bit faster.