Finding out why a hard drive, member of software RAID 1, failed

Today we've received the following message from mdadm monitoring on one of our servers:

This is an automatically generated mail message from mdadm running on host.website.tld

A Fail event had been detected on md device /dev/md126.

It could be related to component device /dev/sda.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md126 : active raid1 sda[1](F) sdb[0]
1953511424 blocks super external:/md0/0 [2/1] [_U]

md0 : inactive sda[1](S) sdb[0](S)
6056 blocks super external:imsm

unused devices:

And since the machine was still up and running on disk /dev/sdb, I was able to SSH without issues and run the following commands:


yum search smart

which gave me:

gdm-plugin-smartcard.x86_64 : GDM smartcard plugin
libatasmart-devel.i686 : Development Files for libatasmart Client Development
libatasmart-devel.x86_64 : Development Files for libatasmart Client Development
smartmontools.x86_64 : Tools for monitoring SMART capable hard disks
ccid.x86_64 : Generic USB CCID smart card reader driver
esc.x86_64 : Enterprise Security Client Smart Card Client
gnupg2-smime.x86_64 : CMS encryption and signing tool and smart card support for GnuPG
libatasmart.i686 : ATA S.M.A.R.T. Disk Health Monitoring Library
libatasmart.x86_64 : ATA S.M.A.R.T. Disk Health Monitoring Library
openct.i686 : Middleware framework for smart card terminals
openct.x86_64 : Middleware framework for smart card terminals
pcsc-lite.x86_64 : PC/SC Lite smart card framework and applications

So I installed the needed package:


yum install smartmontools.x86_64

and ran

smartctl --all /dev/sda -T permissive
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-431.29.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Short INQUIRY response, skip product id
SMART Health Status: OK
Read defect list: asked for grown list but didn't get it

Error Counter logging not supported
Device does not support Self Test logging

Hmm, not so much clear. So what about


smartctl -a -d ata /dev/sda

to find out the reason:


Smartctl: Device Read Identity Failed: Input/output error

Tags:

Add new comment

Filtered HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
To