Tuesday, 12 July 2016

Smart behind 3ware

I SMART-check my drives from time to time, in particular, I'm looking at the "Reallocated_Sector_Ct". When a drive realises that one of its sectors is failing, it reallocates a spare from its pool of spares. When that's just a few, it's no big deal, but eventually it climbs up to the hundreds, which is not good, and then the thousands, which is bad, and eventually the tens of thousands, and that's very bad.

It's an easy thing to do. Here's my program that checks all the drives on a server.

# run smartctl and summarise the info
use Sys::Hostname;

($i, $i, $i, $mday, $mon, $year, $i, $i, $i) = localtime(time);
$mon ++;
$year4 = $year + 1900;
$datestr4 = sprintf("%4.4d-%2.2d-%2.2d", $year4, $mon, $mday); # yyyy-mm-dd
$hostname = hostname();
$hostnickname = substr ($hostname, 0, 5);
print "Summary of disk smart stats for $hostnickname on $datestr4\n";
foreach $device ('sda','sdb','sdc','sdd','sde','sdf','sdg','sdh','sdi','sdj','sdk','sdl','sdm','sdn','sdo') {
  $result = `/usr/sbin/smartctl -a /dev/$device`;
  next if $result =~ /No such file or directory|No such device/;
  ($model) = $result =~ /.*Device Model:\s*(.*)$/mi;
  ($serial) = $result =~ /.*Serial Number:\s*(.*)$/mi;  
  ($readerror)   =  $result =~ /.*Raw_Read_Error_Rate.* (\d+)$/mi;
  ($reallocated) =  $result =~ /.*Reallocated_Sector_Ct.* (\d+)$/mi;
  ($seekerror) =  $result =~ /.*Seek_Error_Rate.* (\d+)$/mi;
  ($ecc)  =  $result =~ /.*Hardware_ECC_Recovered.* (\d+)$/mi;
   print "Reallocated_Sector_Ct $reallocated /dev/$device $model $serial \n";

This gives me an output that looks like this:

Summary of disk smart stats for bigbe on 2016-07-12
Reallocated_Sector_Ct 0 /dev/sda Maxtor 96147U8 N803S94C
Reallocated_Sector_Ct 0 /dev/sdb ST3750840AS 3QD0BPRT
Reallocated_Sector_Ct 856 /dev/sdc ST2000DL003-9VT166 5YD60Y6W
Reallocated_Sector_Ct 0 /dev/sdd ST4000DM000-1F2168 S301335G
Reallocated_Sector_Ct 294 /dev/sde MAXTOR STM31000340AS 9QJ2MLZW
Reallocated_Sector_Ct 31 /dev/sdf ST32000542AS 6XW1QNGH
Reallocated_Sector_Ct 1882 /dev/sdg Hitachi HDS5C3020ALA632 ML0220F30MAVXD

So then I know where each drive is, and how bad it is. In the example above, drive sdg
is looking well dodgy, and sdc not much better. Except that I did a scan three months ago and got the same numbers, so it isn't failing just yet.

I've just upgraded Lenny to be one of my "big servers". Lenny is a Dell Poweredge 1550; a 1U box with 2gb of memory, twin Pentium 1266 processors running Fedora 24. So, not really big like the 64gb twin-xeon machine, but I only want it for a backup. For that to work, I needed to put in nine hard drives to bring it up to 18tb (I could do that with a couple of 8tbs and a 4, but I've got a load of 2tbs sitting doing nothing, so why waste them? The problem was, Lenny only has two PCI slots, and my usual 4-port Sata cards would therefore only support 8 drives. 

So I dug into my "box of bits" and got out an 8-port 3ware card that I last used a *long* time ago, and replaced one of the 4-ports with that. I put on the nine drives.

How do you put nine drives in a 1U box designed to take at most two? You make up another 1U box containing a power supply, the nine drives and a plethora of fans, and lead the Sata cables out the back and into Lenny, using Sata extension cables (£1 each on Ebay) to give both the requisite length, and also an easy way to separate the boxes when I need to move them. That worked; Linux recognised the nine drives, and it's all being loaded up while I write this (over a gigabit ethernet link). I'm getting 42gb/hour.

But when I ran my smart program, it didn't get any data from the drives on the 3ware controller.

Here's what you do. Look at /dev, and you'll find a bunch of devices starting with tw. The first of those is twe0 on Lenny, and on Lenny, those represent sdg, sdh, sdi, sdj and sdk.

So for those devices, instead of 

$result = `/usr/sbin/smartctl -a /dev/$device`;

I do

 $result = `/usr/sbin/smartctl -a -d 3ware,N /dev/twe0`;

Where N is 0, 1, 2, 3, 4. And that gives me the output I want.

 Summary of disk smart stats for lenny on 2016-07-12
Reallocated_Sector_Ct  /dev/sda  UFA3P2509NLF
Reallocated_Sector_Ct  /dev/sdb  UFA3P250925L
Reallocated_Sector_Ct 0 /dev/sdc ST4000DM000-1F2168 S30135K7
Reallocated_Sector_Ct 0 /dev/sdd ST4000DM000-1F2168 S301382Q
Reallocated_Sector_Ct 95 /dev/sde MAXTOR STM31000340AS 9QJ32RLN
Reallocated_Sector_Ct 0 /dev/sdf ST2000DM001-9YN164 Z1E0YGVQ
Reallocated_Sector_Ct 0 /dev/sdg ST31000340AS 9QJ00SJW
Reallocated_Sector_Ct 22 /dev/sdh MAXTOR STM31000340AS 9QJ304XG
Reallocated_Sector_Ct 17 /dev/sdi MAXTOR STM31000340AS 9QJ32RKS
Reallocated_Sector_Ct 0 /dev/sdj ST2000DM001-1CH164 Z340BKPV
Reallocated_Sector_Ct 0 /dev/sdk ST32000542AS 5XW1MFQ3

Drives sda and sdb are the SCSI drives that came with Lenny that I'm using as the system disks.

And then I found another way to do all this. It's called a Sata port multiplier, and it turns one port into five! They cost £13, although having one means I don't need the five M-F extension cables, so the effective cost is £8. I've ordered one, to see how well it works.


No comments:

Post a Comment