The Lone Coder Reflections for the Unsung Linux Saviours
by Ken O. Burtch
Bad Docs or Adventures in Linux RAID-land
Ink is better than the best memory.
-- Chinese proverb
Last year, my father came by my house and dropped off a vintage Honda
motorcycle
(Honda Home Page)
that he bought new in the 1980's, saying he had no room for it in his garage.
Motorcycles are one of the simplest machines: they require little maintenance to
keep them running. With some carbureator cleaner, a couple of tanks of gas, and
a little black and silver paint, it was looking and running like new. Still, the
headlight was burned out and I thought it wouldn't hurt to get it lubed and checked
over by a professional. My father had purchased the motorcycle from a place that
used to be the number one motorcycle retailer in Niagara. I gave them a call.
The guy at the service desk told me flat out that their shop would never service
a bike of that age. The reason? To have a bike that old around their shop
would reduce the selling price of the new models. In fact, he explained,
their store refused to service any motorcycle that was 10 years or older...which,
for low-maintenace machines, was nothing.
I didn't know whether to be insulted because he said vintage
motorcycles lessened the value of things that they touched, or because he
implied that I was a liar when I said it was running fine, or because the store
didn't service what they sold. He told me to get rid of the old Honda and he
would be happy to sell me a NEW bike. But one thing I was certain of: my next
motorcycle would come from a place that provided reliable, quality service.
With the continual growth of technology in society, things
keep becoming more fragile and shorter lived. It took centuries before
the barter system was fully replaced by metal coins for the exchange of goods
and services. Coins became replaced by paper money. Paper money by plastic
cards. Plastic cards by exchanges of electrons on-line. Each becomes more
fragile and has a shorter lifespan of dominance. But the barter system
is still used on a daily basis today.
(Currency, Wikipedia)
My Sega Genesis video game console
(Sega Genesis, Wikipedia)
is 20 years old and still plays its cartridges and the controllers still work.
But an XBox 360 or Playstation 3 uses a hard drive, which have a shorter lifespan
than ROM cartridge games. It's unlikely that these newer game systems will make it
to a 20th anniversary. Or a $500 camera bought in the 1970's might still take
good photos today but a $500 digital camera bought today may be made obsolete in a
few years as the USB, flash memory or firewire become obsolete. Or a TV today
can play music files and photos but how long until those file standards are
no longer used.
The more people rely on technology, the more fragile and
uncertain the lifespan of their work. I've still got my old
Apple IIgs
(Apple IIgs, Wikipedia)
sitting on my desk, the one I used for years to create shareware
games. But I've been unable to boot the hard drive and extract my old source
files. If I could, could I rig up a YModem connection
(YModem, Wikipedia)
to transfer the files
to a new machine? Years of work may be gone forever. Most people don't make
backups on
their home computers, and when they do, they don't use disaster recovery
methods to check the usefulness or integrity of the data they've saved.
Though DVD manufacturer's claim a 30-year or longer lifespan
(DVD, Wikipedia),
a single bad disk block may corrupt a file stored on backup disks, or the
backup disk may not last long if exposed to adverse conditions like heat and
sunlight. That's why it's important not only to back up data but test it and
migrate it to newer media. And, like the barter system, sometimes the best
backup is to simply print out your source code on high endurance everyday paper
paper that comes from uncool, low tech trees.
I purchased a couple of 1.5 terabyte hard drives and I decided to
set them up during my January vacation. Due to the reputation of some of
these high-capacity drives, I was reminded of how fragile data has become and I
decided I had better protect my personal data in my Linux /home directory.
If you have a pair of hard drives, the best method of doing this is using
Linux's software emulated RAID 1.
RAID is a standard for using multiple hard drives as if
they were a single drive
(RAID, Wikipedia).
(This is why many of the RAID tools on Linux begin
with "md" (multiple disks).) Although RAID traditionally uses entire hard drives,
it can also work with disk partitions, though it is best if each of those partitions
is located on a different hard drive. The RAID 1 standard, also called "mirroring",
uses two identically sized disk partitions to hold your files and if one partition
gets corrupted, it will automatically use the other partition to recover the missing
pieces. Mirroring is not very economical--it duplicates absolutely everything--but
with two large capacity drives on a home system, RAID 1 is a sensible choice
(RAID 1 failure
rate and performance, Wikipedia).
Many Linux installation programs support RAID partitions. I was using OpenSuSE 11.1
which supports RAID 0, 1 and 5 on installation. So I created two 1 terabyte partitions,
/dev/sda6 and /dev/sdb3, on my new drives and combined them into a RAID 1 multiple disk
mirrored partition, /dev/md1, and used /dev/md1 as the mount point for /home. (You can
read more of my installation notes at
Install OpenSuSE 11.1 in the documentation
section of this site.)
The install went well and I starting using OpenSuSE. But as I
used it, I noticed a strange behaviour: every few minutes the desktop would
lock up for several seconds. Running the "top" command, I saw the disk I/O taking
up all the CPU capacity and the RAID 1 process using most of the CPU. What was
going on?
It was very difficult to find documentation on the Linux software RAID
emulation. The Linux RAID How-To is unmaintained and many years
out of date, and doing web searches for answers may lead you to ancient comments from
a time when Linux RAID worked in totally different ways. My search
eventually lead me to the
Linux RAID Wiki--basically
a copy of the How-To document that somebody posted as a Wiki in the hopes that the
Linux community would update the documentation. But even this site was woefully
incomplete and obsolete, with broken links and old instructions.
For a simplified view of what your RAID disks are doing,
you can examine the /sys filesystem. Here's a shell script:
#!/bin/bash
function show_status {
RAID="$1"
STATE=`cat /sys/block/$RAID/md/array_state`
ACTION=""
if test -f /sys/block/$RAID/md/sync_action ; then
ACTION="/"`cat /sys/block/$RAID/md/sync_action`
fi
echo "$RAID: $STATE""$ACTION"
}
readonly -f show_status
MOUNTED_RAIDS=`mount | fgrep "/dev/md" | cut -d\/ -f3 | cut -d\ -f1`
echo "$MOUNTED_RAIDS" | ( while read RAID ; do
show_status "$RAID"
done )
This will give a short summary such as:
md0: clean
md1: active/recover
For more details, read the mdstat file in the proc file system:
If you have root access, the "mdadm" command can give you a
complete overview, pulling information from several sources:
# mdadm --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Fri Mar 13 00:04:07 2009
Raid Level : raid1
Array Size : 1241599488 (1184.08 GiB 1271.40 GB)
Used Dev Size : 1241599488 (1184.08 GiB 1271.40 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Mar 17 11:30:26 2009
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 18% complete
UUID : 1a77c9a1:57364931:3d186b3c:53958f34
Events : 0.113850
Number Major Minor RaidDevice State
0 8 6 0 active sync /dev/sda6
2 8 19 1 spare rebuilding /dev/sdb3
In these examples, my /home directory was rebuilding itself after
/dev/sdb3 was added to /dev/md1. I'll explain why this is happening in a moment.
It turns out that the OpenSuSE installation tool is obsolete
and has bugs. First, the current Linux software RAID supports far
more than RAID 0, 1 and 5: it even supports oddball formats like RAID 4 or
RAID 1+0 (or "RAID 10") used in high performance systems.
Second, when you select RAID 1 in OpenSuSE's tool, no matter what chunk size
you select, OpenSuSE will always assign you a giant chunk size. This giant
chunk size was the source of my desktop locking up: every time the computer
went to update the drives, it was writing megabytes of data for even small
file changes. As one person said to me, "If mdstat reports that your
terabyte partitions are made up of seven chunks, you know you've got a problem."
So that left me no choice but backup /home and
rebuild the partition with a reasonable chunk size. Doing this without
proper documentation was risky at best. Following the instructions on the
Linux RAID Wiki, I rebuilt my RAID partition, mounted it, restored my files and
everything worked fine...until I rebooted. The kernel was unable to find
my /home partition. Why? After extensive searching, I found an obscure
article that said you have to zero the superblock prior to reformatting an
old RAID partition. Once I zeroed the superblock, my /home partition was
recognized. So here's the proper steps to re-RAID a RAID 1 mirror partition.
Switch to root and:
# umount /home # make sure /home is not mounted
# mdadm -S /dev/mdZ # stop raid from doing its thing
# mdadm --zero-superblock /dev/sdaX # clear old raid setup
# mdadm --zero-superblock /dev/sdbY
# mdadm --create --verbose /dev/mdZ --level=raid1 --raid-devices=2 /dev/sdaX /dev/sdbY
# recreate with default settings for RAID 1
# mkfs -t ext3 /dev/mdZ # format the device with an appropriate file system
# tune2fs tune2fs -i 30 -c 10 /dev/mdZ # disk check every 30 days / 10 reboots
# mount /home
where X,Y,Z are the appropriate drive numbers. In my case,
/dev/md1, /dev/sda6 and /dev/sdb3. The tune2fs paramters can be changed
to your liking, but in my case, I preferred more frequent testing to protect my data.
I've got my /home directory back up. It's in RAID 1.
It doesn't lock up the computer anymore. But a new problem appeared: the
RAID software keeps removing the second partition during periods of high activity,
with the message
"super_written gets error=-5". It sounds like the error occurs when trying to
update the superblock on the partition, but what is a -5 error anyway? I
followed the
directions given to turn off "swncp" on sata_nv drivers from a discussion at
Red Hat but
it had no effect. Is it hardware problems? There's been reports of this
problem since Linux kernel 2.6.20
(List
Discussion) although a hardware problem can be ruled out by forcing a
full self-test on your SATA drive with "smartctl".
# smartctl -t long /dev/sdb
The missing partition can be re-added with mdadm. This will
force a complete rebuild of that partition but it will only remain functioning
until another period of high activity, where the entire partition is dropped
again.
# mdadm --add /dev/mdZ /dev/sdbY
It would be nice if I could begin using my new terabyte
drives for more than debugging Linux system software.
Protecting your data integrity is very important today, and
that means considering options like RAID 1 on today's high capacity hard drives.
Getting RAID working on Linux is no easy feat, and it's made
that much tougher by a lack of up-to-date instructions. What good is RAID 1
if there's no accurate documentation on how to set it up, no examples of
typical configurations, no descriptions of error messages and what to do
about them? Even the error messages are unreadable. It's as if the software
RAID programmers expect the users to read the source code and figure out what to do.
Meanwhile, I found a cycle shop that will fix the headlight
on my Honda motorcycle, so I need to buy a trailer hitch for my car to bring
the cycle in the shop. I went to U-Haul St. Catharines to order one. "It'll
be in a couple of weeks," the guy at the counter said. Nine months later,
still no trailer hitch. Maybe Niagara needs a Honda motorcycle How-To
document as well. Just a thought.
« Truth Humility Communication Nobility Freedom Purity
Excellence Right Support Courage Compassion Quality Honesty Trust
Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994