Helpful Notes: Recovering from a failed RAID drive

Created: 2024-12-02, Last Updated: 2024-12-13

This is a work in progress as I'm currently recovering my RAID array on my custom built, 10+ year old NAS.

Context

There is a DIY Linux RAID server with a failed drive. What steps should be taken and what tools are helpful in fixing the problem.

BACKUP FIRST!

Before you do anything else, back up that RAID array or make sure that the backup is current and complete. It's possible to cause data loss by relying on the other drives in the array. Most likely the drives are all the same as they were probably all the same, bought at the same time, and from the same vendor. If one failed there is a good chance another will too as they were both subject to the same conditions.

Incremental approach via rsync

The goal is to capture as many files as possible before the array suffers from a complete degradation. A good approach that has worked for me is to get the smaller files first and then get the bigger ones. This can be done with rsync with the following:

rsync -avr --max-size=<size threshold> <target> <destination>

here is an example:

rsync -avr --max-size=25m . /media/backup_usb/

This is helpful when the larger files might be nice to have but it would be better to have more of the smaller files, e.g. get all the pictures before fetching the larger videos. In this scenario the first run may have a --max-size of 25m, but then is boosted to 50m, 75m, 100m, 150, 200m, etc. on subsequent runs.

This approach may not always be best and the larger files might be more valuable, e.g. full database backup vs transactional backups. Good judgement is necessary here as to maximize recovery efforts.

Tools

mdadm

A command used for building and managing RAID arrays. The --detail option will be what clues us into the problem. This will also be used to add and remove drives from the array.

clarkd@indigo:/media/backup_usb$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sat Mar  9 13:31:58 2013
     Raid Level : raid1
     Array Size : 976630336 (931.39 GiB 1000.07 GB)
  Used Dev Size : 976630336 (931.39 GiB 1000.07 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Sun Dec  1 20:41:40 2024
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : Indigo:0
           UUID : c50f9458:d26d54d9:cada3995:4629b965
         Events : 20578

    Number   Major   Minor   RaidDevice State
       2       8       17        0      active sync   /dev/sdb1
       2       0        0        2      removed

This shows us what drives are part of the RAID array

parted

Used to create new partitions. * Easy to see what devices are what via print devices * Create partitions via mklabel msdos. I'm using msdos as I'm going to use samba to share the drive. I'm not sure if this is necessary, but it's how the old drive was set up.

fdisk

A command to display or manipulate disk partitions. This is needed to identify our drives. Used with the -l option.

This will also be used to create the partition on new drives via the new partition wizard (n).

df

A command used to show info about the file system.

du

A command to display disk usage. This is helpful in checking how much data there is and how much was backed up.

By default this will output a lot of information, but we probably don't need that much for our needs. Some good options to use are

rsync

You're already using this, right? It's a program capable of efficiently copying data from one place to another. It's great for incrementally backing things up, but it has many options. As such it has it's own section above under the "BACKUP FIRST!" section.

Recovery

After it's confirmed that there is a working backup the next step is to acquiring new drives. After that, the first step is to install one of the drives.

Create the Partition with parted

Pull up the new drive with sudo parted /dev/sdc. print will be handy to see the status of things. Then use mklabel to create the partition. Using msdos because I think I might need it for samba, but I'm probably wrong. I'll need to experiment with this later.

Format the Partition with fdisk

Pull up the drive with fdisk like so fdisk /dev/sdc Here’s what to use step by step:

The default partition type is Linux (ID 83). We need to change this to Linux autoraid (ID fd). You can verify this by pressing p (as in print), but that’s optional. To change the type:

Add the drive to the Array

This is done with sudo mdadm /dev/md0 --add /dev/sdc1. Once it's added the RAID will start to rebuild it. source

Monitor the Rebuild

This can be done with watch cat /proc/mdstat source

Back