What is mdadm and why does RAID recovery fail?
Short answer: mdadm (multiple device admin) is the Linux kernel's software RAID tool. Unlike hardware RAID cards, mdadm stores the RAID configuration in each disk's superblock — meaning the RAID can be reassembled on any Linux system with the right command, even if the server's motherboard fails. A degraded mdadm RAID array is accessible and recoverable in most cases. The critical mistake people make is acting too fast without diagnosing which drive actually failed, then accidentally removing a healthy drive from an already-degraded array.
Step-by-step mdadm RAID rebuild guide
Step 1: Identify the failed member and check array status
Run cat /proc/mdstat to see the array status. A degraded array shows the number of active drives versus total drives (e.g., "3/4" on a RAID 5 with one failed disk). Then run mdadm --detail /dev/md0 (use your array device name) to see which specific drive is marked FAULTY or removed. Cross-reference with dmesg | grep -i 'error\|fail' to see recent kernel error messages about specific drives. Identify the failing drive by device path before physically touching anything.
Step 2: Image the failing drive before removal
If the failed drive still spins up and is partially readable, image it with ddrescue before removing it from the array. This sounds counterintuitive — the drive is already marked failed, why image it? Because some RAID rebuild failures happen when a drive is marked degraded due to a temporary connection issue rather than true failure, and the "failed" drive still contains valid data. ddrescue -f -n /dev/sdX /path/to/drive-image.img /path/to/drive-image.log. This takes time but is the safest approach.
Step 3: Add a replacement drive and initiate rebuild
After physically installing the new drive (same or larger capacity), add it to the degraded array: mdadm /dev/md0 --add /dev/sdX (replace sdX with the new drive). mdadm will immediately begin rebuilding (resyncing) the parity. Monitor rebuild progress with watch cat /proc/mdstat. Rebuild time varies from a few hours to over a day depending on array size and drive speed. Do not power off the server during rebuild — an interrupted rebuild leaves the array in an inconsistent state.
Step 4: The India angle — hardware vs software RAID cost gap
Indian SME server setups frequently mix hardware RAID cards (from Dell PERC, HP Smart Array, or LSI/Broadcom controllers) with software RAID. The recovery cost difference is significant: software RAID (mdadm) arrays can be recovered by any competent Linux engineer because the configuration is readable from the disks themselves. Hardware RAID arrays require either the exact same controller model or a specialist with proprietary firmware tools to reconstruct the RAID metadata. A dead LSI RAID controller on a server running in a Hyderabad IT park can cost ₹20,000–₹60,000 just for a replacement card, plus labour, versus software RAID where the same scenario costs a fraction. For new SME server deployments, we recommend mdadm over budget hardware RAID cards specifically because recovery is far simpler.
When to call a recovery service (and what it costs in India)
When DIY ends
Stop and call a professional if: two or more drives are marked failed in a RAID 5 (beyond one-drive tolerance), the array was created on a hardware RAID controller that has failed, mdadm --assemble --scan finds no arrays (superblocks may be corrupted), or you accidentally removed the wrong drive and the array is now showing inactive.
Typical cost in India
Software RAID (mdadm) recovery assistance for degraded array with one failed drive: ₹3,000–₹8,000. RAID recovery after dual-disk failure (RAID 5) or multi-disk failure (RAID 6): ₹10,000–₹30,000. Hardware RAID controller replacement and data recovery: ₹20,000–₹80,000. See our RAID data recovery cost guide and our data recovery service page.
A note from the LRW Engineer Team
The most expensive RAID mistake we see is the premature drive pull: a server admin sees a degraded array, pulls what they think is the bad drive (picking the wrong one), and turns a recoverable degraded RAID 5 into an unrecoverable total failure. Always check mdadm --detail twice. Identify the device path, cross-reference with physical drive labels, and only then remove the drive. The two minutes of verification have saved more data than all the recovery work combined.