Phew...

Purdy on 2001-11-20T19:48:03

Well, you can prolly tell that I've quite a wild 30 hours, sleeping about 6 of those. Trial by fire is prolly the best way to put it and leave it at that. Thankfully we were not hacked (though our hosting provider was quick to jump on that one & put the blame on me) - on Saturday, the kernel spit out these fine messages:

Nov 17 05:56:07 www kernel: Unable to handle kernel paging request at virtual address 81a40369 Nov 17 05:56:07 www kernel: current->tss.cr3 = 02ef9000, %%cr3 = 02ef9000 Nov 17 05:56:07 www kernel: *pde = 00000000 Nov 17 05:56:07 www kernel: Oops: 0000 Nov 17 05:56:07 www kernel: CPU: 0 Nov 17 05:56:07 www kernel: EIP: 0010:[find_inode+26/56] Nov 17 05:56:07 www kernel: EFLAGS: 00010297 Nov 17 05:56:07 www kernel: eax: 81a4030! ebx: 00000006 ecx: c02682c8 edx: 81a40301 Nov 17 05:56:07 www kernel: esi: c9fdda00 edi: 00000006 ebp: c9fdda00 esp: c6fe1ecc ... And So On ...

The hosting folks (after I finally got ahold of them on Monday, @ 9:15am) mentioned that the cooling fan in the CPU unit wasn't working, so my best guess is that the computer overheated.

Since then, I have got the database server re-installed, re-loaded with backup data, got the Web serverS going (with SSL, mod_perl and PHP - which is interesting when you're dealing with a Cobalt RaQ), installing the required Perl modules, and so on. I was also lucky enough to get our old SSL certificates back, so we don't have to regenerate (& thus, pay for) them!

So I'm slowly getting out of shock mode and will soon get back on track with regular development. Thanksgiving break is coming up and I am thankful I won't be stuck here @ the office still trying to get things up & going (though there are still a few minor bugs to work out, I'm sure). I'm also going to use the break time to sleep and think out redundancy/backup solutions. Cobalt does have a built-in backup system, but we weren't using it at the time (had a homegrown solution). If this happens in the future, a restore would be a piece of cake. Also gotta think about a redundancy program, where we could have a backup server that could be brought to the frontline if something ever goes wrong again.

I'm also going to have to have a talk with someone in management over at our hosting about SLA's and procedures. Vastnet's support e-mail address was forwarded to some guy that was on vacation, of all things.

I hope everyone has a great vacation, for you Americans out there!

Jason

overheating CPUs

hfb on 2001-11-20T20:06:06

The all-time-best kernel panic message on Solaris was when the fans on the first revision of Ultra 1/2 boxes would fail causing the CPU to overheat and the kernel to panic writing "CPU Meltdown!" before crashing. It's good to see a sense of humour :)