dubious quality of GNU source-code

ethan on 2003-05-27T16:55:30

Yesterday within half an hour I hacked together C that will soon hit the CPAN. I simply put all the relevant C-functions from the findutils-package and butchered them into Locate.xs, massaged the whole thing a little to return the data to the caller and that was it essentially.

However, one thing is odd. There is not a single call to C within locate.c and related C files. So it allocates all the memory for the strings it extracts from the locate-db files (more than 12meg for a search for 'mp3' on my computer) but never frees them, benefiting from the fact that locate(2L) is shut-down afterwards anyway. Naturally, when turning that into a module, memory must be cleaned up properly and this turns out to be not that easy. It uses statically allocated memory at some places and my attempts to replace that with dynamic memory allocation have resulted in segfaults so far. Strange. I always though that it would be good style to always release the memory one allocated no matter whether this is necessary in a narrower sense.

Other than that, the style of the code is pretty ancient. It still uses K&R function-declaration style for instance and some other baddies and anachronisms.


free() ain't free

samtregar on 2003-05-27T17:14:04

Calling free() takes time. Perl itself declines to actually free() allocated memory at program destruction as an optimization. Maybe it's your knowledge of C which is of dubious quality!

Perhaps your module should just wrap system("locate") instead of reinventing this wheel.

-sam

Re:free() ain't free

ethan on 2003-05-27T17:44:50

Calling free() takes time. Perl itself declines to actually free() allocated memory at program destruction as an optimization.

And you think that this analogy makes any sense? Perl has different destruct_levels that do different things depending on whether perl was compiled with multiplicity or not. We are talking here about a potentially threading application where not freeing memory can be a valid decision.

You can't apply that to locate. You run it once and it quits. When you start it again, the not-freed memory has in the meantime been claimed back by the OS. If you start the next instance of locate, all memory allocations have to be done once again. There is no optimization involved.

Maybe it's your knowledge of C which is of dubious quality!

Yeah, it must be that.

Perhaps your module should just wrap system("locate") instead of reinventing this wheel.

So spawning off a new process using system() or whatsoever is more efficient than a piece of compiled C code that is dynamically linked into perl? Makes sense. Also, system("locate ...") is known to work beautifully under any operating system around (especially windows).

You'll also have to explain to me why File::Find was a reinvention of the find-program. Following your logic, system qw(find ...) would do just as well.

Re:free() ain't free

samtregar on 2003-05-27T17:55:55

You can't apply that to locate. You run it once and it quits.

Sure I can. Why would I want to wait for all the various free() calls to return before I get my shell prompt back? exit() is much faster! If you're going to call someone's code quality "dubious" then you'll have to consider why they made the decisions they did. If you don't then someone else should point them out to you, like I just did.

So spawning off a new process using system() or whatsoever is more efficient than a piece of compiled C code that is dynamically linked into perl?

No, not in terms of runtime costs. In terms of programming time, though, it wins hands down. You get to decide which is more important but I thought I'd toss out the option. I've seen people spend a lot of time failing to adapt command-line C programs into C libraries.

Also, system("locate ...") is known to work beautifully under any operating system around (especially windows).
Are you going to port updatedb too? Your locate won't work too well without it! I'd think it would be smarter to use the native locate-equivalent on Windows. Recent Windows (XP at least) have a built-in indexer for fast file finding. Of course, with Cygwin even system("locate") isn't out of the question!

-sam

Re:free() ain't free

ethan on 2003-05-27T18:30:24

If you're going to call someone's code quality "dubious" then you'll have to consider why they made the decisions they did. If you don't then someone else should point them out to you, like I just did.

I might come to the conclusion that there has been no such considerate decision. :-) The findutils-4.1 package stills contains a bug that causes segfaults on every run of locate so various patches exist to address this. However, there is no official GNU patch yet. Instead several non-GNU fixes float around mailing-lists etc (from which I have pulled my patch actually). I wouldn't call this a very decisive policy. No, not everything from GNU is shiny...as no thing in life has only good sides.

[system() more efficient than linked-in C code?]
No, not in terms of runtime costs. In terms of programming time, though, it wins hands down. You get to decide which is more important but I thought I'd toss out the option. I've seen people spend a lot of time failing to adapt command-line C programs into C libraries.

But mine is already done and did not fail (in my estimation). I am not reimplementing anything, I just copy and change the source-code a little to make it library-ish. Sure, the effort is only mine but I guess any module-author invests more time into his modules than he is personally going to get back from it. Others might save some time, though.

Are you going to port updatedb too?

This idea occured to me directly after I finished the module. What I'd like to have eventually is:


        use File::Locate;
        my $loc = File::Locate->new;
        $loc->add("some_file"); ...
        $loc->write_db("/path/to/locatedb");


Right now there is only a locate() function that returns all found entries in list context and a boolean value in scalar context (true, after a first matching entry has been found by return()ing immediately from the XSUB).

Naturally, in the current form it is of limited use under operating system not having a locate. However, even a windows-user can already query an existing locate-db.

I'd think it would be smarter to use the native locate-equivalent on Windows.

This is cumbersome to control from Perl. Just a while back I wrapped libdbx that allows to read Outlook Express mailboxes across different platforms from Perl. Yet, without it you'd need OLE to do the same from Perl under windows. Sure, it works, but it's not the way you'd usually want to access a mailbox.

Another idea

djberg96 on 2003-05-27T19:59:05

Scrap the whole GNU/locate/locate-db/findutils issue entirely. Write your own from scratch. Use SQLite. Do it the way that *you* think it should be done. Worked for me with "whereis" (although I realize the effort for that is nowhere near what it would be for locate).

Just a thought.

Re:Another idea

ethan on 2003-05-29T06:32:36

Scrap the whole GNU/locate/locate-db/findutils issue entirely. Write your own from scratch. Use SQLite. Do it the way that *you* think it should be done. Worked for me with "whereis" (although I realize the effort for that is nowhere near what it would be for locate).

Indeed, it'd be more work but with the benefit of making something that is to my own liking.

Personally, I do like locate a lot and there were a number of reasons why I wanted to stick to it: First of all there was already a codebase in C (I am constantly looking out for things that could be implemented in XS)...also it's tiny and wont take weeks to understand how it works. Secondly, I wanted an interface to the existing locatedb infrastructure. findutils also contain a little tool called frcode, that implements a clever incremential compress algorithm that could be of use separatedly (just 3K or so in source-code). If I eventually write the routines for creating locatedbs this algorithm will need to be incorporated, too. Perhaps even as Algorithm::Frcode or so.

Anyway, it's time to add the tests (sigh) to File::Locate and upload it to the CPAN.