[Navigation Bar]  
 
 

    

[OpenSUSE powered]
[BUSH powered]
[vi powered]
[XML] [RSS]
The Lone Coder
Reflections for the Unsung Linux Saviours
by Ken O. Burtch
 
 
[Lone Coder]

 Bad Docs or Adventures in Linux RAID-land

Ink is better than the best memory.

 

-- Chinese proverb

Last year, my father came by my house and dropped off a vintage Honda motorcycle (Honda Home Page) that he bought new in the 1980's, saying he had no room for it in his garage. Motorcycles are one of the simplest machines: they require little maintenance to keep them running. With some carbureator cleaner, a couple of tanks of gas, and a little black and silver paint, it was looking and running like new. Still, the headlight was burned out and I thought it wouldn't hurt to get it lubed and checked over by a professional. My father had purchased the motorcycle from a place that used to be the number one motorcycle retailer in Niagara. I gave them a call. The guy at the service desk told me flat out that their shop would never service a bike of that age. The reason? To have a bike that old around their shop would reduce the selling price of the new models. In fact, he explained, their store refused to service any motorcycle that was 10 years or older...which, for low-maintenace machines, was nothing.

I didn't know whether to be insulted because he said vintage motorcycles lessened the value of things that they touched, or because he implied that I was a liar when I said it was running fine, or because the store didn't service what they sold. He told me to get rid of the old Honda and he would be happy to sell me a NEW bike. But one thing I was certain of: my next motorcycle would come from a place that provided reliable, quality service.

With the continual growth of technology in society, things keep becoming more fragile and shorter lived. It took centuries before the barter system was fully replaced by metal coins for the exchange of goods and services. Coins became replaced by paper money. Paper money by plastic cards. Plastic cards by exchanges of electrons on-line. Each becomes more fragile and has a shorter lifespan of dominance. But the barter system is still used on a daily basis today. (Currency, Wikipedia)

My Sega Genesis video game console (Sega Genesis, Wikipedia) is 20 years old and still plays its cartridges and the controllers still work. But an XBox 360 or Playstation 3 uses a hard drive, which have a shorter lifespan than ROM cartridge games. It's unlikely that these newer game systems will make it to a 20th anniversary. Or a $500 camera bought in the 1970's might still take good photos today but a $500 digital camera bought today may be made obsolete in a few years as the USB, flash memory or firewire become obsolete. Or a TV today can play music files and photos but how long until those file standards are no longer used.

The more people rely on technology, the more fragile and uncertain the lifespan of their work. I've still got my old Apple IIgs (Apple IIgs, Wikipedia) sitting on my desk, the one I used for years to create shareware games. But I've been unable to boot the hard drive and extract my old source files. If I could, could I rig up a YModem connection (YModem, Wikipedia) to transfer the files to a new machine? Years of work may be gone forever. Most people don't make backups on their home computers, and when they do, they don't use disaster recovery methods to check the usefulness or integrity of the data they've saved. Though DVD manufacturer's claim a 30-year or longer lifespan (DVD, Wikipedia), a single bad disk block may corrupt a file stored on backup disks, or the backup disk may not last long if exposed to adverse conditions like heat and sunlight. That's why it's important not only to back up data but test it and migrate it to newer media. And, like the barter system, sometimes the best backup is to simply print out your source code on high endurance everyday paper paper that comes from uncool, low tech trees.

I purchased a couple of 1.5 terabyte hard drives and I decided to set them up during my January vacation. Due to the reputation of some of these high-capacity drives, I was reminded of how fragile data has become and I decided I had better protect my personal data in my Linux /home directory. If you have a pair of hard drives, the best method of doing this is using Linux's software emulated RAID 1.

RAID is a standard for using multiple hard drives as if they were a single drive (RAID, Wikipedia). (This is why many of the RAID tools on Linux begin with "md" (multiple disks).) Although RAID traditionally uses entire hard drives, it can also work with disk partitions, though it is best if each of those partitions is located on a different hard drive. The RAID 1 standard, also called "mirroring", uses two identically sized disk partitions to hold your files and if one partition gets corrupted, it will automatically use the other partition to recover the missing pieces. Mirroring is not very economical--it duplicates absolutely everything--but with two large capacity drives on a home system, RAID 1 is a sensible choice (RAID 1 failure rate and performance, Wikipedia). Many Linux installation programs support RAID partitions. I was using OpenSuSE 11.1 which supports RAID 0, 1 and 5 on installation. So I created two 1 terabyte partitions, /dev/sda6 and /dev/sdb3, on my new drives and combined them into a RAID 1 multiple disk mirrored partition, /dev/md1, and used /dev/md1 as the mount point for /home. (You can read more of my installation notes at Install OpenSuSE 11.1 in the documentation section of this site.)

The install went well and I starting using OpenSuSE. But as I used it, I noticed a strange behaviour: every few minutes the desktop would lock up for several seconds. Running the "top" command, I saw the disk I/O taking up all the CPU capacity and the RAID 1 process using most of the CPU. What was going on?

It was very difficult to find documentation on the Linux software RAID emulation. The Linux RAID How-To is unmaintained and many years out of date, and doing web searches for answers may lead you to ancient comments from a time when Linux RAID worked in totally different ways. My search eventually lead me to the Linux RAID Wiki--basically a copy of the How-To document that somebody posted as a Wiki in the hopes that the Linux community would update the documentation. But even this site was woefully incomplete and obsolete, with broken links and old instructions.

For a simplified view of what your RAID disks are doing, you can examine the /sys filesystem. Here's a shell script:

#!/bin/bash

function show_status {
  RAID="$1"
  STATE=`cat /sys/block/$RAID/md/array_state`
  ACTION=""
  if test -f /sys/block/$RAID/md/sync_action ; then
     ACTION="/"`cat /sys/block/$RAID/md/sync_action`
  fi
  echo "$RAID: $STATE""$ACTION"
}
readonly -f show_status

MOUNTED_RAIDS=`mount | fgrep "/dev/md" | cut -d\/ -f3 | cut -d\  -f1`
echo "$MOUNTED_RAIDS" | ( while read RAID ; do
   show_status "$RAID"
done )

This will give a short summary such as:

md0: clean
md1: active/recover

For more details, read the mdstat file in the proc file system:

$ cat /proc/mdstat

Personalities : [raid0] [raid1] 
md1 : active raid1 sdb3[2] sda6[0]
      1241599488 blocks [2/1] [U_]
      [===>.................]  recovery = 17.2% (213764288/1241599488) finish=181.0min speed=94595K/sec
      
md0 : active raid0 sda5[0] sdb2[1]
      62910432 blocks super 1.0 32k chunks
      
unused devices: 

If you have root access, the "mdadm" command can give you a complete overview, pulling information from several sources:

# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Fri Mar 13 00:04:07 2009
     Raid Level : raid1
     Array Size : 1241599488 (1184.08 GiB 1271.40 GB)
  Used Dev Size : 1241599488 (1184.08 GiB 1271.40 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Tue Mar 17 11:30:26 2009
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 18% complete

           UUID : 1a77c9a1:57364931:3d186b3c:53958f34
         Events : 0.113850

    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       2       8       19        1      spare rebuilding   /dev/sdb3

In these examples, my /home directory was rebuilding itself after /dev/sdb3 was added to /dev/md1. I'll explain why this is happening in a moment.

It turns out that the OpenSuSE installation tool is obsolete and has bugs. First, the current Linux software RAID supports far more than RAID 0, 1 and 5: it even supports oddball formats like RAID 4 or RAID 1+0 (or "RAID 10") used in high performance systems. Second, when you select RAID 1 in OpenSuSE's tool, no matter what chunk size you select, OpenSuSE will always assign you a giant chunk size. This giant chunk size was the source of my desktop locking up: every time the computer went to update the drives, it was writing megabytes of data for even small file changes. As one person said to me, "If mdstat reports that your terabyte partitions are made up of seven chunks, you know you've got a problem."

So that left me no choice but backup /home and rebuild the partition with a reasonable chunk size. Doing this without proper documentation was risky at best. Following the instructions on the Linux RAID Wiki, I rebuilt my RAID partition, mounted it, restored my files and everything worked fine...until I rebooted. The kernel was unable to find my /home partition. Why? After extensive searching, I found an obscure article that said you have to zero the superblock prior to reformatting an old RAID partition. Once I zeroed the superblock, my /home partition was recognized. So here's the proper steps to re-RAID a RAID 1 mirror partition.

Switch to root and:
# umount /home                        # make sure /home is not mounted
# mdadm -S /dev/mdZ                   # stop raid from doing its thing
# mdadm --zero-superblock /dev/sdaX   # clear old raid setup
# mdadm --zero-superblock /dev/sdbY
# mdadm --create --verbose /dev/mdZ --level=raid1 --raid-devices=2 /dev/sdaX /dev/sdbY
                                         # recreate with default settings for RAID 1
# mkfs -t ext3 /dev/mdZ               # format the device with an appropriate file system
# tune2fs tune2fs -i 30 -c 10 /dev/mdZ # disk check every 30 days / 10 reboots
# mount /home

where X,Y,Z are the appropriate drive numbers. In my case, /dev/md1, /dev/sda6 and /dev/sdb3. The tune2fs paramters can be changed to your liking, but in my case, I preferred more frequent testing to protect my data.

I've got my /home directory back up. It's in RAID 1. It doesn't lock up the computer anymore. But a new problem appeared: the RAID software keeps removing the second partition during periods of high activity, with the message "super_written gets error=-5". It sounds like the error occurs when trying to update the superblock on the partition, but what is a -5 error anyway? I followed the directions given to turn off "swncp" on sata_nv drivers from a discussion at Red Hat but it had no effect. Is it hardware problems? There's been reports of this problem since Linux kernel 2.6.20 (List Discussion) although a hardware problem can be ruled out by forcing a full self-test on your SATA drive with "smartctl".

# smartctl -t long /dev/sdb

The missing partition can be re-added with mdadm. This will force a complete rebuild of that partition but it will only remain functioning until another period of high activity, where the entire partition is dropped again.

# mdadm --add /dev/mdZ /dev/sdbY

It would be nice if I could begin using my new terabyte drives for more than debugging Linux system software.

Protecting your data integrity is very important today, and that means considering options like RAID 1 on today's high capacity hard drives. Getting RAID working on Linux is no easy feat, and it's made that much tougher by a lack of up-to-date instructions. What good is RAID 1 if there's no accurate documentation on how to set it up, no examples of typical configurations, no descriptions of error messages and what to do about them? Even the error messages are unreadable. It's as if the software RAID programmers expect the users to read the source code and figure out what to do.

Meanwhile, I found a cycle shop that will fix the headlight on my Honda motorcycle, so I need to buy a trailer hitch for my car to bring the cycle in the shop. I went to U-Haul St. Catharines to order one. "It'll be in a couple of weeks," the guy at the counter said. Nine months later, still no trailer hitch. Maybe Niagara needs a Honda motorcycle How-To document as well. Just a thought.

March 21, 2009 

[Cafe] Comment [Link Opens New Window]

Talk back on the Linux Cafe

[RSS] Subscribe

Works with Firefox, Thunderbird or RSS viewers

Digg! Gotta Digg The Lone Coder /
Share at SlashDot [Link Opens New Window]

Recommend this Article

^ Back to the Top

Read More:  "Clean Code": Agile Handbook or Engineering Attack? --> 

  • December - SparForte 1.3 Preview
  • November - Potato Chip Technology
  • August - Unit Tests : An Pound of Prevention?
  • July - What's that Bug? Common Niagara Critters
  • May - Spectacular Failures: Firefox 4 and LibreOffice
  • April - BYOD: The End of Silly IT Contracts?

Read More:  The Lone Coder Home Page --> 

 
     

« Truth Humility Communication Nobility Freedom Purity Excellence Right Support Courage Compassion Quality Honesty Trust Cooperation Challenge Education »
PegaSoft Canada - A Linux Association Since 1994