Saving Your Data: A Pro's Guide to GNU ddrescue for Failing Linux Drives

Table of Contents

When the I/O Buffer Becomes a Black Hole

You’re running a routine rsync to back up an old 2TB SATA drive, and suddenly the terminal stops responding. You check dmesg, only to find a wall of I/O error, dev sdb, sector 123456. Standard Linux tools are built for healthy filesystems. When they hit a physical bad sector, they often get stuck in an infinite retry loop or crash entirely, leaving you with a corrupt, incomplete backup and a drive that is one step closer to the recycling bin.

Hardware failure doesn’t wait for a convenient maintenance window. Last year, I had a production storage disk start the dreaded “click of death” just as a sync started. In these moments, every second the platters spin is a gamble. You need a tool that doesn’t just copy data, but maps the damage and prioritizes the easy-to-reach, healthy blocks before the hardware gives up the ghost.

The Fatal Flaw of Standard Commands

Standard utilities like cp or dd handle disk errors poorly because they lack a “big picture” view of the drive’s health. When the kernel requests a block and the drive fails to read it, the hardware retries several times before reporting a failure. This creates two major problems:

cp / rsync: These operate at the filesystem level. If a 1GB video file sits on a single bad sector, the command might hang for minutes. If you force it to skip, you often lose the entire file, even though 99.9% of its data is perfectly readable.
dd: Standard dd is strictly linear. If it hits a bad patch, it either stops or fills the sector with zeros. The real issue is efficiency. It can spend 30 seconds retrying one dead sector while thousands of healthy blocks further down the platter remain uncopied. On a drive with 500 bad sectors, dd could waste hours grinding the mechanical heads into dust.

GNU ddrescue flips this script. It uses a sophisticated multi-pass algorithm that grabs the “low-hanging fruit” first. It copies the healthy areas at maximum speed, then circles back to “trim” and “scrape” the difficult spots. If the drive disconnects mid-process, the mapfile ensures you can resume exactly where you left off.

The Strategy: Treat the Mapfile as Your Savior

Never run ddrescue without a mapfile. This file tracks which blocks are finished, which are pending, and which are confirmed dead. It’s what allows you to interrupt the process, switch cables, or even combine data from two different failing drives into one healthy image.

1. Pinpoint the Failing Drive

Start by identifying your source and destination. Never attempt to recover data onto the same physical drive.

lsblk
dmesg | grep -i error

For this example, we’ll assume /dev/sdb is the dying disk and we’re saving an image to a healthy 4TB drive mounted at /mnt/recovery.

2. Phase 1: The High-Speed Grab

Your first pass should be as non-intrusive as possible. We want to skip any sector that takes more than a split second to read. This secures the bulk of your data before the drive’s motor or controller fails entirely.

# Install the tool
sudo apt update && sudo apt install gddrescue

# Run the initial fast pass
sudo ddrescue -n -b 512 /dev/sdb /mnt/recovery/disk_image.img /mnt/recovery/rescue.map

Key flags explained:

-n: Skips the “scraping” phase to avoid wasting time on tough sectors early on.
-b 512: Sets the sector size. Most older drives use 512, though newer “Advanced Format” drives might need 4096.
rescue.map: This is your progress log. Protect it.

3. Phase 2: Targeted Retries

Once the easy data is safe, tell ddrescue to get aggressive with the remaining gaps. We’ll use direct I/O to bypass the kernel cache, which often provides more reliable results on failing hardware.

sudo ddrescue -d -r3 /dev/sdb /mnt/recovery/disk_image.img /mnt/recovery/rescue.map

-d: Direct disc access. It’s slower but bypasses kernel-level interference.
-r3: Tells the tool to try each bad sector 3 times before moving on.

Reading the Vital Signs

Watch the errsize field in the live status view. This is your most important metric. If you see an errsize of 512KB on a 1TB drive, you’ve recovered 99.999% of your data. At that point, further grinding might not be worth the risk of total mechanical collapse.

After the process completes, you’re left with a raw .img file. Don’t try to open it like a folder; it’s a bit-for-bit clone of the drive partitions.

Mounting the Result

Use losetup to map the partitions inside your image file so you can mount them like real disks:

sudo losetup -fP /mnt/recovery/disk_image.img
# Identify the new loop device (likely /dev/loop0)
lsblk

# Mount the partition read-only
sudo mount -o ro /dev/loop0p1 /mnt/data_recovered

Pro tip: Always use the ro (read-only) flag. If the filesystem inside the image is damaged, mounting it with write permissions could trigger an automatic fsck that causes more corruption.

Feature Comparison: Why ddrescue Wins

Feature	dd / cp	GNU ddrescue
Error Handling	Hangs or aborts	Skips and logs
Resumability	None	Native (via mapfile)
Data Protection	Low (High stress)	High (Prioritizes healthy blocks)

Execution Summary

The moment you suspect hardware failure, stop all writes to that drive. Don’t browse the folders. Don’t run fsck. Physical damage is a hardware problem that software can only mitigate by being smart.

My standard workflow is now three steps: Image the drive with ddrescue, store the dying hardware in a static bag, and perform all data extraction on the image file. This approach gives you infinite tries to fix the filesystem or run tools like testdisk without the ticking clock of a failing motor hanging over your head.