So there was a London.pm technical meeting last night, earlier today, something like that. Leon gave a short talk about some modules he particularly liked and some he didn't. On his naughty list was File::Find
Personally, I don't have a problem with File::Find, but I know from my time on irc that people whine about it a lot. My hunch as to why people don't like it is that it makes you use callbacks, which can confuse people. To add insult to confusion, once you're in the callback you have to gyrate oddly with package variables.
The started me wondering, if so many people don't like the thing, why isn't there an alternative? Of course this took me nowhere, so I just wrote it off as insufficient JFDI in the world.
So now, until I either crack it or get bored, I'm trying to come up with a better way to do what File::Find gives you, and it's not proving terribly easy.
What I'm currently thinking of something like:
my @found = NewFind->name( '*.mp3' )->type( 'file' )->size( '<1000K' )->find( '.' ); # finds all smallish mp3s
Each method returns an object apart from the find method, which returns a list of things that matched your specification. An or-like operation could look like this:
my $f = NewFind->type( 'file' )->or( NewFind->name( '*.mp3' ), NewFind->name( '*.ogg' ) ); # search for oggs and mp3s
How does that hit people - rampant wheel reinvention, or a good start? Comments and alternative api suggestions welcome, hopefully we can get to something that will suit how people want to find files.
Looks good. Grouping in the search clause might be awkward syntactically, but for fairly simple stuff, I like it. You might also provide a way to provide a code block that returns a boolean for custom stuff; So, adding to your example, try looking for smallish mp3s that have some specific ID3 tag:
my @found = NewFind->name( '*.mp3' )->type( 'file' )->size( '<1000K' )->custom( sub { ...search for id3 tags here... } )->find( '.' );
"custom" is probably the wrong name for it, but (hopefully) you understand what I'm getting at.
Come to think of it, the search for mp3s OR oggs would be easy with Q::S:
my $f = NewFind->type( 'file' )->name( any('*.mp3', '*.ogg') );
NewFind->file() instead of ->type('file')
NewFind->name( '*.mp3', '*.ogg' ) to get a shorter 'or' condition (in this case, can
be written as
NewFind->name( qr/\.(mp3|ogg)$/ ))
Provide an ->exec( \&command ) hook, similar to the -exec option to find(1) : i.e., gets the pathname as its only parameter,
returns true or false.
Think about -prune and finddepth.
Re:Syntactic sugar for less verbosity
richardc on 2002-07-19T09:01:43
NewFind->file() instead of ->type('file')Yes, I was being very literal in a transliteration of a find(1) example, apart from making it longer of course.
NewFind->name( '*.mp3', '*.ogg' ) to get a shorter 'or' condition (in this case, can be written as NewFind->name( qr/\.(mp3|ogg)$/ ))
I like both of those.
I still think there's need for a form of
or
. I just can't think of a good example right now.Good example for 'or'
rafael on 2002-07-19T09:16:54
my $finder = NewFind->or(
NewFind->name( '*.pl' ),
NewFind->exec( sub {
my $file = shift; my $fh;
if (open $fh, $file) {
my $shebang = <$fh>;
close $fh;
return $shebang =~/^#!.*\bperl/;
}
return 0;
} ),
);Re:Syntactic sugar for less verbosity
jdavidb on 2002-07-19T17:01:52
You still need an explicit
or
when your conditions are not on the same variable. The form shown here works forvar == val1 or val2
but notvar1 == val1 or var2 == val2
. For example, file is greater than 500M or older than 3 days.Re:Syntactic sugar for less verbosity
richardc on 2002-07-21T11:04:59
I'm sorry, I don't follow.var == val1 or val2
seems to be likename('*.mp3', '*.ogg')
andvar1 == val1 or var2 == val2
is:# files greater than 500M or older than 3 days
F->or( F->size( '>500M' ),
F->age ( '> 3 days' )
);As in rafaels "Good example for 'or'" post
Re:Syntactic sugar for less verbosity
jdavidb on 2002-07-29T13:54:01
Yes, you followed what I was saying. I was giving an example where you have to have an explicit F->or method.
Re:Syntactic sugar for less verbosity
richardc on 2002-07-22T19:17:25
My thoughts about finddepth are to ignore it, as it confuses me.Can I get a quick show of hands as to whether people will really miss this? If so then I'll find a way to make it work.
Yeah, I thought about doing a "nice" version of File::Find myself. I wanted to tie in a few things as well - the ability to get structured data as well as lists from it, and to cache data.
I'm not sure I like the stream-y interface you've got here - powerful, but violates the KISS principle which makes File::Find such a pain in the arse to use at the moment. I'd pictured more of a hash-based interface, but I hadn't thought of options so much (but I guess they could be done either by regexps or arrays).
my $fh = File::Find::Foo->new ( { start => '.', name => ['*.ogg', '*.mp3'], maxdepth => 2 } );
Note the use of an array for alternation, and the way the parameters are the same as you pass to find(1). I suspect '.' would be a sensible default for the start attribute. The new() command should do the file search and keep it in an internal data structure.
$fh->gettree();
should return a based data structure, while $fh->getlist();
will get a list of filenames relative to start
. There should be some kind of refresh() method to run the search again if files have changed.
Re:Interface
richardc on 2002-07-19T11:15:24
I'm not sure I like the stream-y interface you've got here - powerful,I wasn't either, hence the posting, but then I saw Rafael's beautiful "Good example for 'or'" for which powerful is certainly one of the words I'd use.
By stream-y I assume you mean the chaining of method calls? If you don't like it you can always do it in longhand:
my $f = NewFind->new();
$f->name( '*.mp3', '*.ogg' );
$f->size( '<10000' );
my @potayto = $f->find('.');
my @potahto = NewFind->name( '*.mp3', '*.ogg' )
->size( '<10000' )
->find('.');but violates the KISS principle
I'm with Jarkko on this one
More seriously, I don't see what's not simple in calling methods on an object.
$fh->gettree(); should return a based data structure, while $fh->getlist(); will get a list of filenames relative to start. There should be some kind of refresh() method to run the search again if files have changed.
Your
getlist
seems to be myfind
, yourrefresh
seems to be myfind
called a second time.gettree
is interesting, could you provide a simulated dump of what it would hold?Re:Interface
djberg96 on 2002-07-19T12:12:53
Whenever I see method chaining, I take the opportunity to point out Robin Houston's Want module. Take a look. Maybe you could use it.Re:Interface
richardc on 2002-07-19T12:52:39
It's an interesting module, sure enough. I don't really imagine needing to bring the big guns in.My current plan is to call the module File::Find::Ruleset. This seems to do some of the hard explaining for me, there are two types of methods, those that add rules to the ruleset, and those that ask questions of it.
The methods that add new rules (name, size, exec, or) return the ruleset object, which makes chaining easier. Those that ask questions (find, find_as_superbly_complex_hash) will return what is expected of them. I don't imagine someone would really want to write:
my @files = File::Find::Ruleset->name( '*.mp3', '*.ogg' )
->exec( sub { artist( $_ ) eq 'Black Lace' } )
->find( $HOME )
->find( '/mnt/mp3' );And after their first run-time error they shouldn't really do it again. Actually, having said that Want is a fine way to stop that exploding at the user, or at least to throw a more reasonable error message.
btw. I'm really liking the ->exec rule now.
Re:Interface
PerfDave on 2002-07-19T12:13:52
The notation of chaining method calls like that is an unfamiliar one to me at least (see, told you I was crap), and probably to the target audience of the module (viz. those who find all those icky callbacks in File::Find too tricky).
As for the tree, I envisaged some kind of structure depending on what parameters you pass to the find involving some quantities of stat()ing etc., thusly:
$VAR1 = {
name => '.',
path => '.',
abspath => '/home/richardc',
type => 'directory',
children => [
{
name => 'music',
path => './music',
abspath => '/home/richardc/music',
type => 'directory',
children => [
{
name => 'S Club 7 - Party.mp3',
path => './music/S Club 7 - Party.mp3',
abspath => '/home/richardc/music/S Club 7 - Party.mp3',
type => 'file'
}
]
},
{
name => 'Procul Harum - A Whiter Shade of Pale.ogg',
path => './Procul Harum - A Whiter Shade of Pale.ogg',
abspath => '/home/richardc/music/Procul Harum - A Whiter Shade of Pale.ogg',
type => 'file'
}
]
}Basically enough information to reconstruct the file list but letting you walk the tree structure.
Re:Interface
Thomas on 2002-07-19T15:10:12
It was almost the same thing I came up when thinking about it.
I'd like to see something like this:
my @files = find('/tmp', { maxdepth => 10, mindepth => 5, name => qr/\.*\.c$/ });
(Which I btw already have working in a small example I hacked together.) I really like your idea of using arrays for alternation, but how do you decide wheter to AND or OR? (AND doesn't make much sense in your example though).
Instead of returning the files I'd also consider an 'exec' like option that took a sub ref or function and just passed the filename to it as the first argument.
That is about all I'd need from a Perl find.Re:Interface
Thomas on 2002-07-19T16:42:14
I've made a quick implementation of what I'd like to see File::Find offer instead. I think it might be a good idea to do a real print function and instead call the one I've used 'return'.
It is of course not nearly done, but if anyone feel they like the concept and want to use it please feel free to do so.
package Find;
use strict;
use vars qw($VERSION @EXPORT @EXPORT_OK %EXPORT_TAGS @ISA);
require Exporter;
@ISA = qw(Exporter);
@EXPORT = qw(find);
@EXPORT_OK = qw(find);
$VERSION = '0.1';
sub find {
my $dir = shift || die "Need a directory to start at\n";
die "Not a directory\n" unless -d $dir || ref($dir) eq 'ARRAY';
my $opts = shift || {};
$opts->{print} = defined($opts->{print}) ? $opts->{print} : 1;
my $depth = 1;
my @files = ();
if (ref($dir) eq 'ARRAY') {
traverse_dir($_, $depth, $opts, \@files) for @{$dir};
} else {
traverse_dir($dir, $depth, $opts, \@files);
}
return @files;
}
sub traverse_dir {
my $dir_name = shift;
my $depth = shift;
my $opts = shift;
my $files_ref = shift;
if (defined($opts->{maxdepth}) && $opts->{maxdepth} {_ok} = 1;
if (defined($opts->{name})) {
if ($file !~/$opts->{name}/) {
$opts->{_ok} = 0;
}
}
if (defined($opts->{mindepth})) {
if ($depth {mindepth}) {
$opts->{_ok} = 0;
}
}
if ($opts->{_ok}) {
if ($opts->{print}) {
push @{$files_ref}, $dir_name . "/" . $file;
}
if (defined($opts->{'exec'})) {
&${$opts->{exec}}($dir_name . "/" . $file);
}
}
traverse_dir($dir_name . "/" . $file, $depth + 1, $opts, $files_ref) if -d $dir_name . "/" . $file;
}
closedir(DH);
}
1;
__END__
=head1 NAME
Find - find files in filesystem
=head1 SYNOPSIS
use Find;
find('/tmp', { 'exec' => \sub { print $_[0] . "\n"; }});
=head1 OPTIONS
find('directory', {.. options .. });
find(['directory1','directory2'], {.. options .. });
It currently has the following options
=head2 print
print in this context refers to wheter or not to return an array of
the files found.
for (find('/tmp', { 'print' => 1 })) {
print $_ . "\n";
}
Would print a list of files found in the '/tmp' directory.
print is set to return a list of files by default.
=head2 exec
exec is an anonymous reference to a function to call which gets passed
as the only argument the filename matching any other conditions.
find('/tmp', { 'exec' => \sub { print $_[0] . "\n"; } });
=head2 name
name is a regular expression to which the file is matched. If the file
in question matches the file the other tests on the file are performed.
find('/tmp', { name => qr/\.c$/, 'exec' => \sub { print $_[0] . "\n";} });
=head2 maxdepth
maxdepth is the maximum number of directories to descend into.
=head2 mindepth
mindepth is the minimum number of directories where matching starts.
=end
Re:Interface
richardc on 2002-07-21T10:55:29
my @files = find('/tmp', { maxdepth => 10, mindepth => 5, name => qr/\.*\.c$/ });Warning: hashes found not to have a predictable order. Evaluating the rules in a known order can have a huge effect on efficiency. Please look at rafael's "Good example for 'or'" comment, if the exec happened first you'd really feel a hit from it just shortcutting that happens at
name('*.pl')
.(Which I btw already have working in a small example I hacked together.)
Please, don't let me stop you releasing your code to CPAN. Some choice in this area is good, certainly considering the current state where people just complain about the current state of affaris without writing an alternative.
For fun, mine's here, note that it still needs a buttload of work on the documentation. Later today I was planning on added some code to make this work
my @mp3s = find_luddite( name => '*.mp3', file )->('.');But maybe find_luddite isn't the best name
:) I really like your idea of using arrays for alternation, but how do you decide wheter to AND or OR? (AND doesn't make much sense in your example though).
AND is the default.
Finder->files
->name( '*foo*' );Means they have to be files, AND they have to have names that match *foo*.
Instead of returning the files I'd also consider an 'exec' like option that took a sub ref or function and just passed the filename to it as the first argument.
Check. Well actually I pass the filename, the directory, and the full path, but that's because you never know what'll really be useful.
That is about all I'd need from a Perl find.
Bzzt! About all I need from a find interface is File::Find. This is a deeply bogus argument, and it's left us where we are since the dawn of time.
In writing this code I'm trying to anticipate what other users will want. You're one of them, but you're not the only one otherwise I'd not even have thought about making the code useable as an iterator, so you'll be able to write.
$finder->find_iteratively('.');
while (my $file = $finder->iterate) {
...
}Warning: routines not really called
find_iteratively
anditerate
. They don't have names right now.Currently the code is a wrapper around File::Find, so the iterator behavior is faked, but if I ever get brave enough to implement all the cross-platform stuff for myself then that'll be a proper iterator. And then there will be rejoicing in the streets. Oh yes.
Re:Interface
richardc on 2002-07-21T16:41:18
I thunk it out a bit more, and now it's possible to do this:# extract from the test suite
# procedural form of the prune CVS demo
use File::Find::Ruleset 'find_ruleset';
$f = find_ruleset(or => [ find_ruleset( directory =>
name => 'CVS',
prune =>
discard => ),
find_ruleset() ]);
my @things = $f->find('.');Yes, it's still calling a method but if don't ever want to see an arrow you can do that too, at a slight performance cost:
my @mp3s = find_ruleset( file => name => '*.mp3', find => '.');
File::Find
replacement in his Programming with Iterators and Generators tutorial (at least he did when he gave his practice session at phl.pm). Perhaps his approach is worthy of investigating as well.
Re:How to find real Link names with File::Find::Ru
runrig on 2007-04-09T21:13:02
You need to include the File::Find 'follow' option, which you can do in FFR with the extras() method.