Some call it the evil empire, I'm just finding it persistently annoying. Thanks to MSN's ignorant robots (they ignore the robots.txt file), they have been continually been creating dead nodes on Birmingham OpenGuides. So far I've been wasting alot of my time trying to delete them as soon as they are created. As a consequence I started noting what IPs have been creating nodes, then doing a few reverse DNS lookups. MSN have not been the only one, but they have been the most persistent, and by ignoring the robots.txt file have caused me to create code to block the creation of specific nodes (they auto redirect to the home node). I shouldn't have to create the code in the first place.
It is generally known that some web crawlers (especially Spam harvesters) ignore robots.txt, and follow links that they should not follow. The best way to resolve it is to make sure that such operations can only be performed by submitting a form - not by following a link.
Why is it not the case in the Birmingham OpenGuides?
Re:Hide such operations with form submissions
barbie on 2006-03-08T12:21:18
Because OpenGuides software automatically creates Category or Locale nodes. The intention was for it to be helpful, which when used correctly it is. However, when abused it auto creates all sorts of junk.
I'd now consider it a serious flaw in the OpenGuides code, as it can allow anyone with malicious intent to create nodes that have no relation to the openguides site and virtually hijack the site. The clean up process becomes time consuming and extremely irritating.
Re:Hide such operations with form submissions
Aristotle on 2006-03-21T18:34:23
Ah, the perils of violating the idempotency of
GET
…