Tuesday, 5 April 2022

Disk failure

Disk failure

I got my first warning, when part of an important file, that was text-only, became corrupt and had binary stuff in it. I was able to replace the binary from a backup.

Then it happened again. And again. I'm so glad that I use a flat-file text-only database.

This is an important computer, it's the one I do billings from, and is therefore inside the innermost bastion of my firewall. That turned out to be a big problem later.

I used the SMART system, which tells me how many bad sectors there are on the drive, and it was large - and growing fast. This disk was on the verge of failing.

So, first I built a replacement computer. I used an Intel motherboard (I bought a job lot of these a few years ago, and they're great) with the CPU an E7500 dual processor running at 2.93 GHz. I put in 8gb of memory, and an 80gb 2.5 inch drive.

Next, to configure it, I used a ip address, because that's in the range I use for "inside", the innermost bastion. That meant that I couldn't connect it to the usual 10.x.x.x range, it had to connect to the "inside". But the switch that is fed by the "inside" is way across the room, so I cleverly put a switch between the firewall "inside" and the switch across the room. Everything still worked. So I took a feed from the new switch.

That should work, right?

But it didn't. And I still don't know why.

I tried lots of things, such as replacing the new switch, and many others that didn't work.

Eventually, I did a "hail Mary" (an American football term for a desperation move) and slung a 10m cable from the old switch right across the room to the new server, and to my utter astonishment, it worked!

So I loaded up the new server, partly from the failing old one and partly from the backup, installed apache (latest version, and configuring that led to much grief).

Eventually, I got everything working, put the new server in the place where the failing server was, and everything is OK now.

No comments:

Post a Comment