Violating HTML Objects

Ovid on 2005-11-17T21:12:41

I finally realized why I was being stupid about how I was testing HTML and XML. The realization was so blindingly obvious that, in retrospect, I'm embarrased that I didn't appreciate it before. You see, if you squint, HTML and XML documents are instances of objects. I don't care about the whitespace, the attribute order, or if the nav links are at the top, bottom, left, etc. What I care about is the information these documents present and the things I can do with them.

When I test an object's API, I shouldn't care if it's implemented with a hashref, inside out objects or RPC calls. I should only care if it does what it promises. Can I click a link and go to the correct page? Does it even have that link? Is the title correct? Those are the things I care about. I'm embarrassed that I've wasted so much time testing rapidly changing internals when the things I really needed to test weren't changing that much.

Realizations like that really hammer home how much I have to learn.


simple_scan

pemungkah on 2005-11-18T02:55:18

We're getting a disgusting amount of functionality out of this teeny-tiny program I wrote called simple_scan. It basically lets nearly anybody write TAP-based tests vs. anything you can talk to over HTTP.

You use test specs to tell it what you want:
http://yahoo.com/ /Yahoo!/ Y branded properly
Run through simple_scan, that generates a nice Perl test that uses Test::WWW::Simple to see if the regex matches the page. Dead simple.

It actually turns out that a lot of what we're worried about is "is the content there and in the right place on the page". And simple_scan, along with a little debugging code the demarcates the different sections of the page, lets us test boatloads of stuff without a line of code being written.

This is important because our developers are harried PHP'ers who want to get the work out and not switch channels between two languages *just* different enough to cause problems, and because our project managers are generally people who know what the product is supposed to do, but aren't really programmers.

So we have a nice tool that works for everybody. I'm not going into plugins and so on because I'm planning a presentation for next year's conferences on how this dead-simple approach is paying off big.

See App::SimpleScan on CPAN if you want to play with it. More plugins on the way.

Re:simple_scan

pemungkah on 2005-11-18T02:59:03

I should break down that test spec:
  1. URL to access
  2. regex to match against it
    • Y - match it
    • N - don't match it
    • TY - TODO, should eventually match
    • TN - TODO, should eventually not match
  3. comment

And it's all TAP, so you can use Test::Harness to write tools to summarize. Should have lots more by CFP time.

Re:

Aristotle on 2005-11-19T18:20:02

Sounds like you want something like Test::WWW::Mechanize, but without the Mechanize part.

Please clarify...

pop on 2005-12-30T14:36:48

What are you doing now ? Which tools are helping you (Test::WWW::Mechanize, no more XPath based tests etc...) ?

By the way, were you testing the HTML page itself or the behaviour of the application generating the HTML page ?

Thanks!