A lesson in recovery techniques

I recently got this message from fsck.jfs:

Unrecoverable error writing M to /dev/sdb3. CANNOT CONTINUE.

Okay, so this is an error that can be ignored – right? I can just force mount the partition and extract the data with superblock marked as dirty … right?!

krc@X61s % mount -o ro -f /dev/sdb3 /mnt/rec_mount
krc@X61s % ls /mnt/rec_mount
krc@X61s %

Damn it! This was an 1,4 TB parition with 900GB of data including home videos and .mkv rips of my dvd’s. Most of data could be restored, but a lot work would be lost.

I am running JFS on all my storage drives, as I have found this a good all-round file system especially in smaller devices with limited resources. Unfortunately this a kind of niche file system that does not have a broad variety of recovery tools.
I found jfsrec as the only (non commercial) tool. Unfortunately this tool was unable to read from the partition directly and stopped with an early EOF marker error.

Jfsrec pointed me in the direction of the dd_rhelp tool. This tool turned out to be a life saver. There was just one thing. I needed a disk big enough to hold a complete dump of the partition.

A few days later, armed with a new disk, I was able to continue. I used this guide at debianadmin.com to get started. The command could not be simpler to use:

krc@X61s % dd_rhelp /dev/sdb3 /mnt/rec_target/bad_disk.img

And it started copying data! Yay!
After some time, it settled on a transfer rate of 2500 … KBps! … Wow… This is rather slow…
Quick calculation: (((1400000000)/2500)/3600)/24 = 6.48 days.

One week later:

krc@X61s % ssh atom1
ssh: connect to host atom1 port 22: No route to host

Hmm… I had done this periodically over the last week

krc@X61s % ping atom1
PING atom1 ( 56(84) bytes of data.
From atom1 ( icmp_seq=1 Destination Host Unreachable
From atom1 ( icmp_seq=2 Destination Host Unreachable
From atom1 ( icmp_seq=3 Destination Host Unreachable
From atom1 ( icmp_seq=4 Destination Host Unreachable
--- atom1 ping statistics ---
6 packets transmitted, 0 received, +4 errors, 100% packet loss, time 5059ms

Hmm… Thats odd. I didn’t remember putting a ; halt -p after the dd_rhelp command.

A few pings and some reflections later I acutally got up and checked the room where the recovery setup is located.

This was what I found:


To quote Freddie Frinton;

I’ll kill that cat!

Notice the dangling sata power cables in the top of the photo… I have always found Linux a stable operating system, but a system disk physically disappearing is valid excuse for a crash!

Fortunately, dd_rhelp got to finish the disk dump – which was very lucky because after the fall, the damaged disk is now officially dead. It no longer spins up, and is not recognised by bios.

I tried running a fsck.jfs directly on the disk image, and it managed to fix the errors in the partition. Now i could mount the disk image like so:

krc@X61s % sudo mount -o loop /mnt/rec_target/bad_disk.img /mnt/rec_target

And copy the files from /mnt/rec_target.