Wow, C++ can be fast

btilly on 2009-03-13T07:11:20

I have long repeated the maxim that you write in a high level language, and only after it proves too slow do you consider dropping to a low level language. I have also found that the occasions where it is appropriate to go low level are few and far between.

In my past experience, I never felt the need. Generally most of the stuff that I work on is I/O bound. And the rest is stuff which it is cheaper to throw another server at than to take programmer time.

But that changed. While working on a contract I encountered the need. I had specced out an algorithm to do a particular task, optimized it somewhat, and implemented it as a series of MySQL queries. With some tweaking I got it to run the job I needed with the plan I wanted, but it was still going to take 5 days. I could extract some naive parallelism, but that would only take it down to a day or so. Since they want the job to run regularly, I need to do better than that. So I decided to go with C++.

The computation dropped from 5 days to 10 minutes! Apparently all of the following are good for speed: Manipulating smaller data structures, being able to fit key data structures into level 1 and 2 cache rather than struggling to keep it in RAM, using direct compiled code rather than microcode, avoiding sorts by keeping things in the order I need, and having direct array lookups rather than searching btree indexes. There were some other minor tweaks done, and more I could do, but on the whole I am quite satisfied with the result. :-)

(And, for C++, it is fairly readable.)

the devil is in the detail

systems on 2009-03-15T14:46:42

it would be nice if you can share more details about what those MySql queries did
i cant imagine how you can drop your processing time from 7200 minutes to 10 minutes, just by using C++
i doubt C++ have much to do with the speed increase, for example, how does the same algorithm perform in Perl!

anyway, would you share more details?

Re:the devil is in the detail

btilly on 2009-09-12T20:25:17

I'm sorry for not getting back. Here is the reason for the speed up.
With MySQL here is the fastest access path to data in memory given a unique identifier. You take the value, do a binary search on an index to find the rows you want and what pages of memory they are in. You then parse through that page of memory to pull out the row, grab the fields you want and put it in a temporary table. All of this is done by an interpreter following micro-code.
Here is the equivalent access path for the C++ code. Access an array by index.
That is a several order of magnitude difference in how much work to access a piece of data. That is the bulk of the performance difference right there.