RAID recovery

From snippet wiki
Jump to navigation Jump to search

Identify the bad disk

To identify the physical disk you might have to read out the serial number of the failing device. If the device is plain dead you can read the number of the still working disk and tell support to change the other one. The smartmontools give you the smartctl command.

smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD5000AADS-00S9B0
Serial Number:    WD-WCAV93159999
...

After replacing the hardware

To resync a software raid mirror after changing a bad disk /dev/sdb:

First print the partitioning data of the still running /dev/sda disk:

parted -l
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2097kB  1049kB                     bios_grub
 2      2097kB  514MB   512MB                      raid
 3      514MB   8706MB  8193MB                     raid
 4      8706MB  500GB   491GB                      raid

and manually recreate that schema on the new disk:

parted /dev/sdb
(parted) mklabel gpt

(parted) mkpart                                                           
Partition name?  []?                                                      
File system type?  [ext2]?                                                
Start? 2097kB                                                             
End? 514MB
(parted) set 1 bios_grub on

(parted) mkpart                                                           
Partition name?  []?                                                      
File system type?  [ext2]?                                                
Start? 2097kB                                                             
End? 514MB
(parted) set 2 raid on

...

(parted) quit

If the disks seem identically you have to add the raid partitions back into the raid:

mdadm --manage /dev/md0 -a /dev/sdb2
cat /proc/mdstat

Start with the smallest partition and repeat the cat command to view any progress.

And finally add grub support to the new disk, to be able to boot from:

grub-install /dev/sdb
Installation finished. No error reported.

Lazy resynch?

Even after days the swap partition might have the state pending. This is not an error!

cat /proc/mdstat

md0 : active (auto-read-only) raid1 sda1[0] sdb1[1]
      16768896 blocks super 1.2 [2/2] [UU]
        resync=PENDING

You might want to force resyncing for cosmetic reasons:

mdadm --readwrite /dev/md0
cat /proc/mdstat

md0 : active raid1 sda1[0] sdb1[1]
      16768896 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.8% (142720/16768896) finish=3.8min speed=71360K/sec