LinuxFocus 1999: Installation and Configuration of a RAID System

Antonio Castro

ACKNOWLEDGEMENTS
I will always be grateful to the whole LinuxFocus team that has collaborated often times from a non visible position to improve the aspect of the article as well as to the translators. On this occasion I must mention a person in particular. This article would not have been possible without the help of Luis Colorado, who, in email after email shared with me his knowledge on RAID systems. Thanks Luis.

Contents:
Introduction
Selecting Disks for a RAID
Characteristics of a SCSI System
Types of RAID
How to Install RAID0

Installation and Configuration of a RAID System

Abstract:

RAID (Redundant Array of Inexpensive Disks) consists of a series of systems to organize several disk drives into a single entity that behaves as a single virtual drive but making the various disks work in parallel ,thus improving the access performance and saving the information stored from accidental crashes.

Introduction

There are a number of hardware solutions in the market, usually very expensive and generally based on the use of certain controllers cards.

There are also other RAID implementations based of cards that allow a user to manage several identical disks devices as a RAID thanks to a simple Z80 chip and on board software. Under these specifications it is not possible to claim that this solution would give better efficiency than a Linux based solution.

Implementations based on controller cards are expensive and also force the user to purchase only identical disk devices. Linux, on the other hand, given the appropriate device drivers could use some of these cards, but that would not be an interesting solution since Linux allows a free software based solution, equally efficient that avoids the expensive hardware alternatives.

... Linux allows a free software based solution, equally efficient that avoids the expensive hardware alternatives.

The RAID system for Linux we will be discussing is implemented at the kernel level and allows us to use disks of different types. The disks can be a mixture of IDE and SCSI disks. Even disks of different sizes could be used, but in this case it is necessary to associate partitions of identical size to each disk. The most common solution is to use several disks of the same size, but nevertheless it is always worth it to mention that Linux allows much more flexibility. For example, part of a disk could be used for a RAID and other part as an independent partition. This is not often a good idea because the usage of an independent partition within a RAID system could reduce the speed access of the RAID system. In other words, eventhough Linux makes it possible to use any kind of disk device it is always better, if possible, to use disks of the same capacity and characteristics. Another important consideration is that SCSI technology permits the concurrent access of the various devices connected to the bus.

By contrast using multiple disk devices on the same IDE controller card means that these devices will never be able to be accessed simultaneously. It is a pity that SCSI disks are still much more expensive than its IDE counterparts. The software solution for a Linux RAID system is equally efficient (if not more) than those based on special cards and of course cheaper and more flexible in terms of the disk devices permitted.

While in a SCSI bus a device can be dumping data to the bus while another is retrieving it, on an IDE interface a disk is first accessed and the other afterwards.

Selection of Disks for a RAID

The use of very fast disk devices to enable a RAID is not often justified. They are more expensive. Disks are fast because their heads are more efficient and rapid at positioning themselves in the appropriate sector. Jumping between sectors is the operation that consumes most time on a hard disk, but under Linux as opposed to MSDOS, for example, this operation is optimized to such degree that the information is not accessed in the same order it is requested, instead it is requested like in an intelligent elevator that memorizes the requests and attends to them in the most efficient order. There are other strategies that increase the performance, minimizing the number of disk access, like memory cache. The rotation speed of disks are often not too different but there may be differences regarding the density and number of heads that can certainly affect the transfer rate significantly. This parameter we must take into consideration. In summary, our recommendation is to use SCSI devices if possible of similar characteristics and not necessarely expensive. The speed rate of the RAID system will be accomplished by the concurrent use and not by their individal speed.

It is necessary also to take into account that the Linux system must start from a non RAID disk device, and of small size so that the root partition is relatively free.

Characteristics of a SCSI system

At the time of purchasing the hard disks many doubts emerge. For this reason it is a good idea to discuss a bit more the main characteristics to look for.


Name	NumBits	NumDev	MB/s	Connector	Max Cable Length
SCSI-1	8	7	5	50 pins LowDens	6 mts
SCSI-2 (alias) Fast scsi, o Narrow scsi	8	7	10	50 pins HighDens	3 mts
SCSI-3 (alias) Ultra, o Fast20	8	7	20	50 pins HighDens	3 mts
Ultra Wide (alias) Fast scsi-3	16	15	40	68 pins HighDens	1.5 mts
Ultra2	16	15	80	68 pins HighDens	12 mts

A RAID can then be built from several disk partitions but the final result is a single logical partition with a single disk in which we can not make any additional partitions. The name of of this logical device is metadisk.

IDE devices have file devices under Linux named /dev/hd..., to SCSI devices correspond /dev/sd..., and to metadisks there will be /dev/md.. after compiling the kernel with the options specified later. Four such devices should be present:

brw-rw----   1 root     disk       9,   0 may 28  1997 md0
brw-rw----   1 root     disk       9,   1 may 28  1997 md1
brw-rw----   1 root     disk       9,   2 may 28  1997 md2
brw-rw----   1 root     disk       9,   3 may 28  1997 md3

Our first goal should be trying to make the swap access time as small as possible, for that purpose it is best to use a small metadisk on the RAID, or to spread the swap in the traditional fashion among all the physical disks. If several swap partitions are used, each on a different physical disk, then the swap subsystem of linux takes care of managing the load among them, therefore the RAID would be unnecesary in this scenario.

Types of RAID

RAID0 (Stripping mode): In this mode, all the disk devices are organized alternatively so that blocks are taken equally from all disks, alternatively, in order to reach higher efficiency. Since the probability of finding a block of a file is identical for all disks, there are force to work simultaneously thus making the performance of the metadisk almost N times that of a single disk.
RAID1: In this mode, the goal is to reach the highest security of the data. Blocks of data are duplicated in all physical disks (each block of the `virtual' disk has a duplicate in each of the physical disks). This configuration provides N times the reading performance of a single device, but it degrades writing operations. Read operations can be organized to read N blocks simultaneoulsly, one from each device at a time. Similarly when writing 1 block it has to be duplicated N times, one for each physical device. There is no advantage in this configuration regarding storage capacity.
RAID4: (Note: the RAID2 and RAID3 types are obsolete).
RAID5: This type is similar to RAID4, except that now the information of the parity disk is spread over all the hard disks (no parity disk exists). It allows to reduce the work load of the parity disk, that in RAID4 it had to be accessed for every write operation (now the disk where parity information for a track is stored differs for every track)

... we are going to concentrate on RAID0 because it is the most efficient despite its lack of redundancy ...

There are other types of mixed RAID based on RAID1 and some other types of RAID. There are also attempts to enable disk compression on the physical hard disks, although not without controversy because it is not clear what the advantage of compresion would be. Almost certainly more proposals will flourishing in the near future. At the moment we are going to concentrate on RAID0 because it is the most efficient despite its lack of redundancy to protect the user from disk failures. When the RAID consists of a few disks (3 or 4) the redundancy has an excesive cost (it looses a third or fourth of the capacity). Redundancy on a RAID protects our data from disks errors, but not from accidental deletion of information, therefore having a redundant RAID does not save us from making backups. On the other hand, if more disks are used (5 or more) then the waste of disk capacity is smaller and redundancy has a lower cost. Some 16-bit SCSI cards allow upto 15 devices. In this case RAID5 would be highly recommended.

If the reader cannot use identical disks take into account that RAID systems always work with identical blocks of information. It is possible that the slow hard disks will be forced to work harder, but in any case the RAID configuration will still yield a better performance. The increase of performance on a RAID system that is properly configure is truly spectacular. It is almost true to say that the performance increases linearly with the number of hard disks in the RAID.

How to Install a RAID0

Next we will describe how to install a RAID0. If the reader wishes to build a RAID different from this one on a kernel 2.0.xx, it is necessary to get a special patch.

RAID0 has no redundancy but consider that to have redundancy it is advised a large number of disks in order not to waste too much disk capacity. Wasting a whole disk when we only have three is a waste. Furthermore, it does not cover all the possible cases of information lost but only those due to physical deterioration of the hard disks, a very uncommon event. If 10 hard disks were available, then using one for parity control is not so much of a waste. On a RAID0 having a disk failure on any of the disks means losing all the information stored in all the physical disks, consequently we recommend an appropriate policy of backups.

The first step to take is adding the appropriate drivers to the kernel. For Linux 2.0.xx RAID the options are:

   Multiple devices driver support (CONFIG_BLK_DEV_MD) [Y/n/?] Y
      Linear (append) mode (CONFIG_MD_LINEAR) [Y/m/n/?] Y
      RAID-0 (striping) mode (CONFIG_MD_STRIPED) [Y/m/n/?] Y

After booting the system with the new kernel the /proc file will have the entry mdstat containing the status on the four (which is the default value) devices newly created as md0, md1, md2 and md3. Since none of them have been initialized yet, there should all appear inactive and there should not be usable yet.
The new four devices are managed using the following 'mdutils'

        -mdadd
        -mdrun
        -mdstop
        -mdop

It can be downloaded from:sweet-smoke.ufr-info-p7.ibp.fr /pub/Linux, but they are often part of most distributions.

For kernels 2.1.62 and higher there is a different package called 'RAIDtools' that permits to use a RAID0, RAID4 or RAID5.

In the following example we illustrate how to define a RAID0 metadisk that uses two hard disks, more specifically /dev/sdb1 and /dev/sdc1.

meta-device RAID Mode Disk Partition 1 Disk Partition 1

/dev/md0 linear /dev/sdb1 /dev/sdc1

More partitions could be added.

meta-device	RAID Mode	Disk Partition 1	Disk Partition 1
/dev/md0	linear	/dev/sdb1	/dev/sdc1

Once the metadisk is formatted it should not be altered under any cricunstance or all the information in it would be lost.

mdadd -a
mdrun -a

At this moment md0 should appear initialized already. To format it:

mke2fs /dev/md0

And to mount it

mkdir /mount/md0
mount /dev/md0 /mount/md0

If everything worked so far, the reader can now proceed to include these commands in the booting scripts so that next time the system reboots the RAID0 metadisk gets mounted automatically. To automatically mount the RAID0 system it is first necessary to add an entry to the /etc/fstab file as well as to run the commands 'mdadd -a' and 'mdrun -a' from a script file executed prior to mounting. On a Debian distribution, a good place for these commands is the /etc/init.d/checkroot.sh script file, just before remounting in read/write mode the root filesystem, that is just before the "mount -n -o remount,rw /" line

For Example:

This is the configuration I am using right now. I have a 6.3 Gb IDE disk, a 4.2 Gb SCSI and another one of 2Gb.
HD 6.3Gb IDE

/bigTemp + /incoming

swap

2Gb(RAID) hda4

HD 4.2Gb SCSI

C: D: swap 2Gb(RAID) sda4

HD 2Gb SCSI

swap 2Gb(RAID) sdb2

#######</etc/fstab>################################################
# <file system> <mount point>  <type>  <options>     <dump>  <pass>
/dev/hda1       /               ext2    defaults       0       1
/dev/hda2       /mnt/hda2       ext2    defaults       0       2
/dev/md0        /mnt/md0        ext2    defaults       0       2
proc            /proc           proc    defaults       0       2
/dev/hda3        none           swap    sw,pri=10 
/dev/sdb1        none           swap    sw,pri=10 
/dev/sda3        none           swap    sw,pri=10

#########</etc/mdtab>####################################### 
# <meta-device> <RAID-mode> <DskPart1> <DskPart1> <DskPart1> 
/dev/md0         RAID0,8k    /dev/hda4  /dev/sda4 /dev/sdb2

The root partition is located on the 6Gb disk as hda1 and then there is a large partition used for the dowload from Internet, CD images storage, etc. This partition does not account for too much load because it is not used often. The 4 Gb disk does not have partitions that can penalize the efficiency of the RAID because they are MSDOS partitions hardly ever used from Linux. The 2G disk is almost fully dedicated to the RAID system. There is a small area reserved in each disk as swap space.

We should try to make all disks (partitions) in the RAID of approximately the same size because large differences will decrease the RAID performance. Small differences are not significant. We use all the space available so that all the data from the disks that can be entangled is and the remaining data remains free.

Mounting several IDE disks on a single RAID is not very efficient, but mounting an IDE with various SCSI works very well. IDE disks do not allow concurrent access, while SCSI disks do.

For more information:

mdutils COmes with documentation
mini-howto Multiple-Disk
The Multiple Disk Layout mini-HOWTO Homepage www.nyx.net/~sgjoen/disk.html

Original in Spanish.

Reviewed by Javier Molero.

Translated by Miguel A Sepulveda and Jose Quesada