Computers are getting faster, that's for sure. However, they are also getting more cores: new laptops thesedays are dual-core and servers are four-or-more-core. Cores are a fancy word for something a bit like another processor. I happen to have lots of things I want to process independently, but if I only use a single process I'll only use one core, a quarter of those available on my server. That's wasting CPU power. The solution is to do more than one thing at a time and common solutions for this are threading and forking. I've found a particularly neat solution which is a very nice idiom too: parallel foreach with Proc::ParallelLoop Something along the lines of the following, which will parallelise the loop by forking 4 workers at a time:
pareach \@todo, \&generate, { Max_Workers => 4 };
The nice thing is that Linux balances each long-lived process on a core, and suddenly my program runs about four time faster! (Okay, so cores aren't complete CPUs - they tend to have dismal floating point performance - but all I am doing are integer calculations so it is mostly the same in my case).
There are many modules on the CPAN which do something similar, but I particularly like the fact that this is a cute idiom: exactly like a foreach, but slightly parallelised. Neat!
See you at the London.pm social meeting tonight!
That may depend upon brand and architecture. But really, Parallel-Each dispatch of floating point operations would be the wrong way, multi-core or single-core SMP.
The issue with Turion64 and Intel Duo-Core / Xeon is largely because they're based on the lower-power Mobile versions of the core. This apparently is because power-and-heat is the limiting design factor today. Heat/Power and cores sharing the FSB are why the clock speeds aren't as fast as the prior server single-core too.
But more to the point, if you have floating-point arrays to parallel-each
over, you want to do it in native arrays and hardware dispatch, not Perl list structure and a perl-core XS module loop. This applies on any serious Fortran engine's FPU Vector mode as well as IA32 MMX SIMD. PDL already has native representation and parallel SMT threaded multiple dispatch, and has a TODO to use the native FP vectors. Even so, it should keep the FP pipeline much more active than a Perl-list-based parallel-each, plus it's CPAN packages interface to scientific libraries that can use vector-mode FP if acquired/built that way (e.g., pick a BLAS lib for Gnu Sci Lib).