Question: Is it possible to annotate/tag each CPAN module update so that we can figure out if the update contains "security fix", "minor bug fix" or "major API change" etc.?
Context: At work we have a repository of third party CPAN modules that we use on Vox or TypePad. Once a module is added to the list, we manually follow the changes of each module to figure out if we need to upgrade (ala fix for major bugs, security issues, memory leaks etc.) or not to upgrade (ala backward incompatible API changes etc.)
It generally works well but sometimes we upgrade a module without knowing that it might break our code. In that case we take a look at how hard it is to update our code to follow the module change, and if it's not that easy, we simply revert the upgrade.
So, I think it's nice if we can automatically or even semi-automatically know, given module XXX-YYY version M to N, what kind of changes the upgrade will contain, without manually looking at Changes and diffing its source code. Note that I'm not saying these audit processes are worthless, but if we know what amount of change the upgrade introduces, it makes the work a bit easier.
Here are two possible solutions:
1) Having a rough standard to indicate these "minor bug fix", "security fix" or "major API change" type of thing in Changes file.
I know CPAN is not a place that we can force all module authors to follow one giant "standard", but we already have some kind of standardization on CPAN modules versioning: if the release is a developer release that "normal" user shouldn't upgrade, we add "_" in the version number so CPAN ecosystem will ignore it. Could we introduce more things similar to this, to tag each module update?
I realize that it's not easy because most authors write Changes file in a free text format. Some authors use more structured formats like YAML, POD or n3/RDF(!), but I myself don't like to do that. Hm, maybe YAML is accetable.
Anyway, if that doesn't sound realistic, I have another solution in my mind, 2) to have a Wiki/del.icio.us-like website where anyone can tag any module release. It might sound a bit more Web 2.0 way to accomplish the original purpose :)
We probably want to integrate the user authentication with PAUSE/BitCard so that we can say "this release is tagged 'minor bug fix' by the author."
Thoughts?
I'd like an RSS feed (or whatnot) that keeps me up to date on changes to modules on my watchlist (likely, modules I have installed or I have as a prerequisite to my releases), so a machine parser for Changes files is something I'm interested in. I'm definitively in the "free format text file" camp. I'm not really against POD, but I'm against human-unreadable formats like RDF, because IMO the Changes file is still mostly for humans to consume and mostly written by humans too. At least my releases "should" be conveniently parseable with some heuristics:
0.46 20071003
+ Bump version because of borked CPAN upload, retrying
* No need to upgrade
Maybe simply leading by example and/or having modules that guess the changes made without needing more fluff as YAML does would push for a change. For example, I could see the the phrase
Having a premade test that checks whether the Changes file is "recent", contains the date/timestamp of the distribution release and something machine parseable could help the module authors to keep their Changes file in machine-parseable format.
Having the authors manually/separately tag their releases will likely not work better than having a well-formatted Changes file. Having others tag releases could help and be web-2.0ish. Storing the information in META.yaml or whatnot would be another approach, but that relies on humans doing the work, and then they could have their Changes file in the "right" format.
Re: Tagging CPAN changes
miyagawa on 2007-11-07T09:11:29
I agree with you on most of the point. Actually I'm currently working on a tool to subscribe to RSS feeds of all modules (thanks to CPAN Recent Changes) we use on TypePad/Vox, and then run a qiuck regular expression like/api change/ or /security|vulnerabiltiy|serious bug/ etc. to quickly find out if we should upgrade or not.
Re: Tagging CPAN changes
miyagawa on 2007-11-07T09:12:51
Oops, the closing tag was misplaced, and also the URL was wrong. Correct Link.Re: Tagging CPAN changes
miyagawa on 2007-11-07T09:54:18
So maybe the combination of these:
Having a new site to tag each release of CPAN modules. The app by default automatically runs some ad-hoc regular expressions to extract changes like/API change/ or /security fix/ from Changes file, and display these tags as "machine parsed tags", probably in a gray color.
The site also works with PAUSE/BitCard authentication so anyone with these IDs can tag releases. In that case we display the tag as "human tags" in a black color.
Finally, if the user is authenticated as an author of the module, and tag them or approve the existent machine/human tags, then the tag is now "author tag" and be displayed in either bold or red.
There should also be a greasemonkey script so you can tag directly on search.cpan.org release pages.
Sounds like an interesting project for a hackathon:) Re: Tagging CPAN changes
Alias on 2007-11-07T12:31:25
META.yml is probably a bad idea, since mostly it is generated from Makefile.PL and Build.PL, and I'd have to have to have stuff accumulating in Makefile.PL every release in addition to in Changes.
I'd also like a machine-readable Changes file.
There's been talk about a Changes.yml, or maybe Changes (which has no defined format anyway) could be in YAML itself.
The obvious question is which YAML schema to use.
Re:Changes.yml
miyagawa on 2007-11-07T16:27:27
Thanks for the link... I saw this article but totally forgot about it.
Well, even though machine readable Changes file is definitely a good step, it still won't fix this exact problem, since even if you use YAML as a format, the content of the changes file (discussed in the link) is still a free text.Re:Changes.yml
hex on 2007-11-09T13:07:32
I've come up with an idea, your comments are welcome.
The use of RDF as a change format has numerous benefits. A large amount of work has already put into constructing metadata vocabularies in it, which saves us from reinventing a big wheel with some crufty format based on YAML (as acme pointed out, YAML is a failed format). RDF is also expressable in different forms: you can take the N3 as linked above and transform it to XML, and then almost everything can read it; or you can apply XSLT to the XML and transform it into something else (like the traditional "Changes" format). It's flexible and powerful.
Re:RDF is a superior format
miyagawa on 2007-11-08T18:01:58
I agree that it's a great format, but for most people it's painful to write.
I'm also interested in the way microformats solve the RFC pain in XHTML, so that they use a rough standard in CSS class names or link@rel etc. so that we can automatically translate these XHTML into RDF later. Can we take a similar approach to that?
Re:Requirement changes
miyagawa on 2007-11-08T17:59:00
Yeah, but isn't it something we can programatically generate based on META.yml changes?
Re:Requirement changes
srezic on 2007-11-08T19:53:48
Yes, you're right. Unless the author forgot to put the requirement into the META.yml at time. And changing an existing distribution is not possible.Re:Requirement changes
miyagawa on 2007-11-08T21:36:22
s/META.yml/Makefile.PL/ maybe, because META.yml is (and can be re-) generated from Makefile.PL REQUIRES parameters.
Re:Module-Changes
miyagawa on 2007-11-08T22:16:24
Cool stuff, but it misses one thing I want: Parser::Free!:)