This is a work in progress as I'm currently recovering my RAID array on my custom built, 10+ year old NAS.
There is a DIY Linux RAID server with a failed drive. What steps should be taken and what tools are helpful in fixing the problem.
Before you do anything else, back up that RAID array or make sure that the backup is current and complete. It's possible to cause data loss by relying on the other drives in the array. Most likely the drives are all the same as they were probably all the same, bought at the same time, and from the same vendor. If one failed there is a good chance another will too as they were both subject to the same conditions.
rsync
The goal is to capture as many files as possible before the array suffers from a complete degradation. A good approach that has worked for me is to get the smaller files first and then get the bigger ones. This can be done with rsync
with the following:
rsync -avr --max-size=<size threshold> <target> <destination>
here is an example:
rsync -avr --max-size=25m . /media/backup_usb/
This is helpful when the larger files might be nice to have but it would be better to have more of the smaller files, e.g. get all the pictures before fetching the larger videos. In this scenario the first run may have a --max-size
of 25m, but then is boosted to 50m, 75m, 100m, 150, 200m, etc. on subsequent runs.
This approach may not always be best and the larger files might be more valuable, e.g. full database backup vs transactional backups. Good judgement is necessary here as to maximize recovery efforts.
mdadm
A command used for building and managing RAID arrays. The --detail
option will be what clues us into the problem. This will also be used to add and remove drives from the array.
clarkd@indigo:/media/backup_usb$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Mar 9 13:31:58 2013 Raid Level : raid1 Array Size : 976630336 (931.39 GiB 1000.07 GB) Used Dev Size : 976630336 (931.39 GiB 1000.07 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Sun Dec 1 20:41:40 2024 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Indigo:0 UUID : c50f9458:d26d54d9:cada3995:4629b965 Events : 20578 Number Major Minor RaidDevice State 2 8 17 0 active sync /dev/sdb1 2 0 0 2 removed
This shows us what drives are part of the RAID array
parted
Used to create new partitions.
* Easy to see what devices are what via print devices
* Create partitions via mklabel msdos
. I'm using msdos as I'm going to use samba to share the drive. I'm not sure if this is necessary, but it's how the old drive was set up.
fdisk
A command to display or manipulate disk partitions. This is needed to identify our drives. Used with the -l
option.
This will also be used to create the partition on new drives via the new partition wizard (n).
df
A command used to show info about the file system.
du
A command to display disk usage. This is helpful in checking how much data there is and how much was backed up.
By default this will output a lot of information, but we probably don't need that much for our needs. Some good options to use are
-h
: print sizes in human readable formats (e.g. 123M, 2G, etc). This should almost always be included.-d
: limits output to directories that are at N levels below the target directory. When dealing with backups a value of -d 2
is reasonable.-t
: limits output to entries that are bigger than the specified threshold. This filters out the smaller records so that the larger records are more obvious.rsync
You're already using this, right? It's a program capable of efficiently copying data from one place to another. It's great for incrementally backing things up, but it has many options. As such it has it's own section above under the "BACKUP FIRST!" section.
After it's confirmed that there is a working backup the next step is to acquiring new drives. After that, the first step is to install one of the drives.
parted
Pull up the new drive with sudo parted /dev/sdc
. print
will be handy to see the status of things. Then use mklabel
to create the partition. Using msdos because I think I might need it for samba, but I'm probably wrong. I'll need to experiment with this later.
fdisk
Pull up the drive with fdisk
like so fdisk /dev/sdc
Here’s what to use step by step:
The default partition type is Linux (ID 83). We need to change this to Linux autoraid (ID fd). You can verify this by pressing p (as in print), but that’s optional. To change the type:
This is done with sudo mdadm /dev/md0 --add /dev/sdc1
. Once it's added the RAID will start to rebuild it. source
This can be done with watch cat /proc/mdstat
source