2

I have two drives in an Intel ICH10 RAID 1. They are not enterprise-level drives; just regular WD Caviar Black drives.

Recently, reading/writing to the mirrored volume has become extremely slow and the HDD light is on constantly. I suspect that this may be due to one of the disks becoming close to failure and attempting sector remapping. (See also What is the fastest way to force hdd to reallocate bad sectors and discard the data?). If this was an enterprise drive, it would fail quickly and cleanly, but this behavior is typical of consumer drives. Hence, it's not immediately clear which drive is bad.

Neither of the drives shows problematic SMART data (this is from the Intel SSD Toolbox which seems to be one of the few options for reading SMART data off an Intel firmware RAID):

First drive

enter image description here

Second drive

enter image description here

Unfortunately, the WD Data Lifeguard Diagnostic tool which is able to run SMART tests is completely confused by the Intel ICH10 RAID:

enter image description here

How can I tell which drive is the problematic one and swap it out?

Andrew Mao
  • 1,241

1 Answers1

2

From what you describe, the first drive is defective. Read Error Rate and Re-allocated Sector Count are non-zero. Re-allocating sectors is exactly what happens when the drive can not read a sector. It will then re-allocate this sector on the next write operation.

You can do several things to confirm this diagnosis:

Simple but uncertain: use a tool like HDD Scan to scan your disk, i.e., read every sector from your disk. You can also do this operation on your RAID 1 array. But than it is up to the RAID-firmware to decide if it will read the data from disk 1 oder disk 2. Therefore this method will not check every sector on both disks. But if disk 1 is about to fail, it is quite probable (but not guaranteed), that its SMART values will worsen.

Keep an eye on Re-allocated Sector Count, Reallocation Event Count and Current Pending Sector Count. If these values go up, your drive is likely to fail soon.

Complicated but gives more certainty:

  1. Mount your drives in a different pc/usb-enclosure/different SATA-port.
  2. Boot from a Live CD (e.g. Ubuntu or Knoppix).
  3. Perform a read only test of your drives. You can do this by SMART commands and/or by using tools like dd or badblocks
    • do NOT attempt to mount the filesystem
    • do NOT write anything to the drive
    • when you do read-only operations, you can re-assemble the RAID without it beeing marked as faulty/inconsistent.
  4. Keep an eye on the same values as mentioned above. Now you should also be able to read the SMART values properly. SMART usually also has a log about previous errors that happened. Drive 1 hat at least two of them. The timestamp is usually expressed as power-on-hours. So you will have to calculate back from the current power-on-hours and see if this correlates with the time you experienced the problems.
masgo
  • 2,324