Automatic license detection in Perl distributions

brian_d_foy on 2008-09-02T18:21:38

Before I reinvent the wheel..

So, given an arbitrary Perl distribution (anything in BackPAN), does anyone have code to guess the license for the code? I know that I can look in META.yml if someone typed the right things somewhere, but that's not enough. Many distributions aren't well-formed, and there are a lot of different ways to license code.

And, does anyone know of Perl distributions that contains files that have different licenses so that that distribution isn't covered by a single license? File A is under License Foo but File B is under Licence Quux, or something like that.


See Module::Install ?

oliver on 2008-09-02T19:06:25

When I used Module::Install it (a) griped about me citing GPL then "All Rights Reserved", then also (b) detected a "same terms as Perl itself" statement, and finally (c) wrote the META.yml for me.

That suggests some smarts are in there somewhere, so my suggestion would be to check that dist out.

...specifically sub license_from() in Module/Install/Metadata.pm, now I've looked!

Re:See Module::Install ?

brian_d_foy on 2008-09-02T19:21:19

That's not smart enough. It's just looking for strings in files without knowing anything about their context. It doesn't discover license versions either.

Thanks though :)

Software::License fwiw

rjbs on 2008-09-02T22:16:45

Software::License is an extended, refactored version of what's in M:I. I'm not sure exactly what you want beyond what M:I does, though, so I have no idea if S:L is more useful at all.

Re:Software::License fwiw

brian_d_foy on 2008-09-02T22:48:17

I looked at Software::License, but it looked like something that went the other way. All I saw was stuff to write the correct text somewhere.

I'm indexing BackPAN, so I want to look at a bunch of files I can't change and get an answer back about what license is present, no matter what they have done.

For instance, some of the files might not have any license notice, but the distro has a COPYING file.

I'm not concerned about any legal stuff and what people should do, just what they have done (that I can't change).

Making people do the right thing is not what I care about for this problem. :)

Re:Software::License fwiw

rjbs on 2008-09-02T22:54:10

S:LicenseUtils has methods for looking at POD or META.yml (which isn't very good for getting specifics). It doesn't try to look at anything else, but could.

It's just, you know, heuristics.

Debian's dh-make-perl...

gwolf on 2008-09-02T22:57:44

In the Debian pkg-perl group, we extensively use a tool developed by us called dh-make-perl, that creates the basic skeleton of a Perl package ready to be assimilated. One of our core requirements is to have the proper copyright information - As often as possible, we take the information from META.yml - but it's often not specified, or even worse, is completely bogus (i.e. META declares one license but a different licensing is explicitly mentioned elsewhere.
You can take a look at our (quite ugly, yes) dh-make-perl. It tries to get this information from META.yml, then from several popular files (i.e. LICENSE, COPYING, etc.), and from the module's POD. And, of course, it goes on to create a machine-parsable debian/copyright file (of course, machine-parsable according to our sick and twisted standards).
I looked a bit into CPAN modules offering this functionality, but found nothing. If you think it's worth it, I can weed it out of dh-make-perl and make it into its own module.

Re:Debian's dh-make-perl...

brian_d_foy on 2008-09-02T23:13:56

I skimmed the code, and maybe it's what I need. If you guys are using it to guess the license, you're probably doing the same thing I am. I'll look more closely in a few days.

Did the debian people steal ideas from anyone else? Do the other linux distros do this too?

Thanks,

CPANTS has_license

gabor on 2008-09-02T23:16:16

CPANTS has a broken version of what you are looking for.

It is broken partially as it is not yet clear what exactly it needs to look for.

Multiple licences

drhyde on 2008-09-03T11:41:22

CPU::Emulator::Z80 has three different licences. Most of it is "GPL2 or artistic" and some of the tests are GPL2 only. That's because I took the tests from a GPL2 project, but I want the bits that are all my own work to be even more free.

Files that are just documentation are a variation on the theme of CC.