Australia gets the DMCA
Last night, the US-Australia Free Trade Agreement went through. Along with many other things, this will include "harmonization" with US intellectual property laws, including trademarks, copyright, and patents.
Somehow I fail to see this being of benefit to Australia.
Sick hardware
A rather important machine of ours went through a number of unexplained reboots a number of months ago. We thought we had diagnosed it as a problem with the UPS, as removing it from the system resulted in sable operation.
Last night, the same machine experienced another set of reboots without any apparent reason. However this morning I discovered the culprit, after disabling the watchdog that was running. It appears that the IDE controller (or the kernel, talking to the IDE controller) completely dies, with many 'lost interrupt' messages on the console for three of the four drives in the machine.
Looks like I'll need to schedule downtime and replace its guts. Dealing with hardware is probably my second least favourite of all sysadmin tasks. (Restoring from tape is my least favourite)
Sick hardware
phillup on 2004-08-04T17:53:39
It appears that the IDE controller (or the kernel, talking to the IDE controller) completely dies, with many 'lost interrupt' messages on the console for three of the four drives in the machine. I saw this once. So, I bought a PCI controller to replace the one built in on the mobo. Didn't help, because it was a bad hard drive. It gets worse, much worse if you have the hard drive as part of a RAID.
What is happening is that some blocks on the drive are failing, and it takes the drive so long to remap that it doesn't answer dma requests in time. (Or... something like that
;-))
If you have the time/hardware... you can run hdparm on the drive to make sure it is getting the proper transfer rates... then run badblocks across the entire drive (don't waste time writing a log file) and wait for the messages to appear. If they do, you found the drive.
Also, check w/ hdparm afterwards to make sure the transfer rate is still where it should be. I've seen some instances where the drive would pass the badblocks test... but the dma got turned off and the transfer rate went from 66MBs to 3 MBs. This was the response from the drive taking so long to remap the bad blocks.
This causes all holy hell to break out when the drive is part of a RAID and the sibling(s) are still running w/ dma, and at a much faster transfer rate.
Also, running the vendor's disk diagnostics did very little good since the drive could remap the bad blocks. It wasn't until I left badblocks running for a week(!) that I was able to consume all of the spare blocks and make it fail the vendor diagnostics. But, the drive would consistently drop it's dma settings whey you ran badblocks on it and it hit the trouble spot on the platter. (I did this because the vendor insisted that they needed the code from the diagnostics disk before they would replace the drive. I had already replaced it in the system with a spare.)