|
When good servers go bad...
Can't fix the truck today. No parts. The parts will be in tomorrow. Ugh.. So, I start working on fixing up my journal. Then, I get a call from Chris telling me that his server, hunter, is down. Ugh, hunter is such a one-off pain-in-the-butt server to deal with. So, I tell him to pick me up, since my truck is damaged. He does, and we go to the colo facility to take a look. Basically, an all-night ordeal of headache, computer-induced torture, and more things that make you go bla... Get to hunter at the colo. It's beeping. One of the fans seems to be malfunctioning. Oh, well. The server is sitting at a maintenance prompt. So, the filesystem needs to be checked. I run the check, and reboot. It dosen't boot. Kernel panics. Bad. The NOC guy there builds me a linux boot disk so I can troubleshoot. No good, I can't mount the filesystems, no megaraid drivers. He says he'll build a new one. Over an hour later of waiting, I decide we should just go back to my place to get my gentoo CDs. We go. Chris gets pulled over for speeding. Chris has a suspended license and lapsed insurance. Cops took Chris's license and plate. I had to complete the drive back to my place about a mile away. Fun. Then I get home and collect CDs that I may need, and have a little chat online (sorry I couldn't stay...). I drove my limp truck back to the colo, and Mark joined us there. Finally mounted the filesystems to take a look, and assess the corruption, or see if it had been hacked and wiped. Looks like corruption. Another pass of fsck showed a LOT of corruption, and fixed things up as best it could. A lot of data was missing from the root partition. So, I tar up the works to another partition, and install a fresh system on the root partition. Configured it with the IPs, tested SSHd, and also dropped the old /var in place. Now, I am back home and get to work on it even more and attempt to resurrect the system. Joy. I better get busy... Oh, and I almost forgot to mention, while rebuilding the system, one of the drives in the RAID array failed! ARGH! So, I left it for another day. He's running on 2 disks with no protection, and a server that is beeping like mad. Fix later. Computers can really be a pain sometimes.
Archive Entry 24: When good servers go bad...
|
|
I had a feeling the disks were failing. I'm willing to bet the corruption was due to two disks having simultaneous read errors. After all those years the drives need to be replaced. High speed disks wear out faster.
Comment by: Mark Grosberg
|
|
How can his server go bad? He barely uses the darned thing. Sigh.
Comment by: Sean Conner
|