I finally realized why I was being stupid about how I was testing HTML and XML. The realization was so blindingly obvious that, in retrospect, I'm embarrased that I didn't appreciate it before. You see, if you squint, HTML and XML documents are instances of objects. I don't care about the whitespace, the attribute order, or if the nav links are at the top, bottom, left, etc. What I care about is the information these documents present and the things I can do with them.
When I test an object's API, I shouldn't care if it's implemented with a hashref, inside out objects or RPC calls. I should only care if it does what it promises. Can I click a link and go to the correct page? Does it even have that link? Is the title correct? Those are the things I care about. I'm embarrassed that I've wasted so much time testing rapidly changing internals when the things I really needed to test weren't changing that much.
Realizations like that really hammer home how much I have to learn.
You use test specs to tell it what you want:
Run through simple_scan, that generates a nice Perl test that uses Test::WWW::Simple to see if the regex matches the page. Dead simple.
http://yahoo.com/
It actually turns out that a lot of what we're worried about is "is the content there and in the right place on the page". And simple_scan, along with a little debugging code the demarcates the different sections of the page, lets us test boatloads of stuff without a line of code being written.
This is important because our developers are harried PHP'ers who want to get the work out and not switch channels between two languages *just* different enough to cause problems, and because our project managers are generally people who know what the product is supposed to do, but aren't really programmers.
So we have a nice tool that works for everybody. I'm not going into plugins and so on because I'm planning a presentation for next year's conferences on how this dead-simple approach is paying off big.
See App::SimpleScan on CPAN if you want to play with it. More plugins on the way.
Re:simple_scan
pemungkah on 2005-11-18T02:59:03
I should break down that test spec:
- URL to access
- regex to match against it
- Y - match it
- N - don't match it
- TY - TODO, should eventually match
- TN - TODO, should eventually not match
- comment
And it's all TAP, so you can use Test::Harness to write tools to summarize. Should have lots more by CFP time.
Sounds like you want something like Test::WWW::Mechanize, but without the Mechanize part.