Pages

Friday 3 June 2016

How to identify a bad drive

There's lots of ways, but I'm going to explain how I do it.

I use a lot of drives - it's inevitable that they fail. What would be best, is if I could predict a drive failure before it happens. And actually, that is possible,

Modern drives have a thing called "SMART". In linux, you interrogate the drive by

/usr/sbin/smartctl -a /dev/sda

There's programs that you can get for Windows that display the information.

That comes back with tons of information - the one I use is "Reallocated_Sector_Ct". When a drive finds a sector going bad, it marks it as bad and allocates a replacement from one of its spares. This figure tells you how many times that has happened.

Zero is the ideal here, of course, and for a 1 terabyte Seagate drive, 2047 is the maximum (I don't know if that's documented anywhere, it's just my experience). I look at these numbers now and then, and if it's getting up into the high hundreds, then it's time to replace that drive.

Another way you can get a warning, is via the system logs. I have a program running on each computer that parses the system logs once per day, looking for drive warnings, and if it sees any, it emails me. One problem here, is that it doesn't tell you the drive letter (sda, sdb etc). It uses an ata number. Here's how you translate ata numbers to drive letters:

for d in /sys/block/sd*
do
  s=`basename $d`
  h=`ls -l $d | egrep -o "host[0-9]+"`
  t=`ls -l $d | egrep -o "target[0-9:]*"`
  a2=`echo $t | egrep -o "[0-9]:[0-9]$" | sed 's/://'`
  a=`cat /sys/class/scsi_host/$h/unique_id`
  echo "$s -> ata$a.$a2"
done


So now that I know which drive is failing (or has failed), it's time to open up the computer. But - the way the drives sit in the box, all I can see is the side. Which one is the one I'm looking for?

So when I install drives, I write the serial number of the drive on to a label that I stick to the side of the drive.

This means that I can take the box down off its shelf, open it up, and I immediately know which drive needs to be replaced.

You're probably wondering what I do with all those drives that are close to the end of their useful life ... I put them in geocaches, so that people can either dismantle them to get the powerful magnets inside, or else use them for storage.

No comments:

Post a Comment