Finding out why a hard drive, member of software RAID 1, failed
Today we've received the following message from mdadm monitoring on one of our servers:
This is an automatically generated mail message from mdadm running on host.website.tld
A Fail event had been detected on md device /dev/md126.
It could be related to component device /dev/sda.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1]
md126 : active raid1 sda[1](F) sdb[0]
1953511424 blocks super external:/md0/0 [2/1] [_U]
md0 : inactive sda[1](S) sdb[0](S)
6056 blocks super external:imsm
unused devices:
And since the machine was still up and running on disk /dev/sdb, I was able to SSH without issues and run the following commands:
yum search smart
which gave me:
gdm-plugin-smartcard.x86_64 : GDM smartcard plugin
libatasmart-devel.i686 : Development Files for libatasmart Client Development
libatasmart-devel.x86_64 : Development Files for libatasmart Client Development
smartmontools.x86_64 : Tools for monitoring SMART capable hard disks
ccid.x86_64 : Generic USB CCID smart card reader driver
esc.x86_64 : Enterprise Security Client Smart Card Client
gnupg2-smime.x86_64 : CMS encryption and signing tool and smart card support for GnuPG
libatasmart.i686 : ATA S.M.A.R.T. Disk Health Monitoring Library
libatasmart.x86_64 : ATA S.M.A.R.T. Disk Health Monitoring Library
openct.i686 : Middleware framework for smart card terminals
openct.x86_64 : Middleware framework for smart card terminals
pcsc-lite.x86_64 : PC/SC Lite smart card framework and applications
So I installed the needed package:
yum install smartmontools.x86_64
and ran
smartctl --all /dev/sda -T permissive
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-431.29.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Short INQUIRY response, skip product id
SMART Health Status: OK
Read defect list: asked for grown list but didn't get it
Error Counter logging not supported
Device does not support Self Test logging
Hmm, not so much clear. So what about
smartctl -a -d ata /dev/sda
to find out the reason:
Smartctl: Device Read Identity Failed: Input/output error
Add new comment