I spent most of today tracking a very ellusive bug. Basically, another developer reported that when he tried to publish a story through our CMS the CGI entered into an infinite loop which lasted until he killed the server. He described the story he was trying to publish and even sent me a dump. Try as I might, I couldn't get anything but success on my machine.
Finally, in desperation, I took over his machine while he went to lunch. I inserted some debugging code and tailed all the logs. That's when I found it. Here's how it works:
The reason I couldn't trigger the problem is that my machine is too fast. I never got the timeout from Apache. Also, it seems that my version of Mozilla (1.3) doesn't retry requests after getting an empty response. At this point I was totally stunned. It never occured to me that the problem could be in the browser!
My solution was to modify the long-running CGI to produce a progress bar as it runs. This ensures that Apache never times out and also keeps the user from getting too bored. I'd planned to do this anyway but it was low on the todo list.
Bug hunting is tiring, but you just can't beat the exhilaration of killing the tough ones!
-sam