Rescue Mode for ClearOS 7

This guide is only for recovering ClearOS 7 installations. For ClearOS 6, go here.

If you've come to this page because you need the information contained here, let us begin by saying that we are sorry your system is not working and we hope this page will help. There are a number of reasons why you may need to use the rescue image. It is a valuable tool for anyone supporting ClearOS and as such is included on every ClearBOX appliance by default. Here are just some of the reasons why you may need to use the rescue image which is contained on your ClearOS installation CD/DVD/USB/ISO:

Hard drive failure
RAID problems
GRUB problems
Kernel boot issues
Init boot issues

This guide is intended as an instructional howto. The commands listed here are not to be used verbatim but are intended to illustrate examples. This guide is provided without warranty to accuracy or applicability to your specific situation. ClearCenter will not be liable for lost data as a result of the data in this guide. Please take all precautions necessary to preserve your own data

DON'T PANIC

Likely your problem is causing you stress and this can lead to extreme reactions which can destroy data. In this guide we will attempt to point out key validation points in the diagnosis and also validation that changes you made actually took. Some of this process is complex and this guide will NOT necessarily meet the needs of your particular problem. That being said, if it doesn't fix the problem, you will know what your problem is NOT.

Establishing the point of failure

You may have multiple problems. For example, a failed disk will affect your RAID, GRUB, Kernel and init process. The boot of ClearOS goes through the following stages:

BIOS/POST
GRUB
Kernel
Init

Powering on your system will result in a series of tests. If your Power On Self Test (POST) completes it will usually beep once and transition straight to the first cylinder on the first device as listed in your BIOS. This first cylinder should contain boot code call GRUB (Grand Unified Boot Loader). GRUB under ClearOS contains an item and a count down timer. This will transition to a black screen which will fill up quickly with text on 5.x and to a graphical screen on 6.x. From here the kernel will load devices, and hand over the boot process to the init scripts.

So where the process ends is key to understand where to start fixing the issue. Error messages are critical and it is a good idea to write them down or Google them if you don't understand WHERE it is failing.

Rescue Image

Starting the Rescue Image

All ClearOS installations contain a rescue image. To successfully start this image you need to tell your BIOS to boot from the installation media. This can require a modification of the boot order in your BIOS or perhaps your BIOS supports a keystroke with allows you to select your boot order (often F12). You will need to use the same mechanisms that you used to install the system. In most cases this is straight forward. With systems that required disk drivers in order to see partitions, you will need to use those same methods to mount the disks to modify the 'root' password.

At the start screen, navigate with the arrows to select 'Troubleshooting'

Press <ENTER>.

At the troubleshooting screen, select 'Rescue a ClearOS System' and press <ENTER>.

After a while a blue screen will come up.

Use the arrow or tab keys to select 'Continue'. Press <ENTER>.

The system will attempt to find your ClearOS partition. If this step was not successful, you may need to load special drivers or contact support for assistance. If it finds your partition, it will notify you that the partition was found and mounted under '/mnt/sysimage'. Press <ENTER>.

For extra measure, we will notify you that your partition is mounted under '/mnt/sysimage'. Press <ENTER>.

You will be dropped to a command prompt. Your prompt will look similar to the following:

sh-4.2#

Special circumstances

RAID issues

Checking partitions

You may be here in this document because you have lost the first disk in your RAID array and the system is either unbootable and/or you need to use the rescue mode for the repair instead of the regular OS. If this is the case you may have identified the bad disk and replaced it already or perhaps you need to just look around and assess the damage. If you did add a new disk, the rescue CD will ask you to initialize that disk.

Survey the landscape by running the following and take an inventory of your physical disks and their partitions:

fdisk -l | less

Here is an example of what a RAID disk will look like from running that command:

Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16        7664    61440592+  fd  Linux raid autodetect
/dev/sda3            7665      243201  1891950952+  fd  Linux raid autodetect

Making partitions on the new disk to match the old

If you've replaced the disk that failed, you will notice that it will not have any partitions yet. Ideally, the replacement disks will have the same geometry as its RAID member if not you will need to get a little more technical and ensure that the partitions sizes either match or are greater than the original:

255 heads, 63 sectors/track, 243201 cylinders

Write this information down. You will need it later so that you can make the new disk with the same geometry and information. Of particular note is the start and end numbers, the partition number, the type and lastly which drive has the asterisk '*' character.

Locate your unformated/unpartitioned disk and run the following (in this example our disk is /dev/sdb):

fdisk /dev/sdb

You will enter the fdisk menu system which will look like this:

[root@server ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 243201.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help):

There are several command that you will use here but to familiarize yourself with the tool, type 'm' on the keyboard and press . This will show you a list of commands. Note the 'p' command. This shows you the proposed layout to the disk. Run this now by pressing 'p'.

On your blank disk it will not show any partitions. Go ahead and let's make one. Type 'n' for new partition. It will ask you whether you want a primary or extended. You can have up to 4 partitions that are primary - or you can have 3 primary and many extended. Typically the first partition will be primary. Type 'p' for primary. When it asks for which partition, use '1' for the first. It will ask what the start cylinder is and by default will show the [1]. If that matches your notes from the other drive then enter that. It will ask for the end cylinder, supply that as well. When it is completed, type 'p' to view your partition. Repeat this process for each partition.

You will likely need to change the type of the drive from 83 to something else. If this is the case then do the following and supply the correct hex code:

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd

Review your changes using the 'p' command.

You will likely need to set the active partition (the asterisk) on the correct partition. Do the following or similar:

Command (m for help): a
Partition number (1-4): 1

Review your changes and ensure that the information is correct. If you want to abort the proposed changes type 'q' to quit. If you want to write these changes and commit the partition proposal to disk, type 'w' for write.

Double-check your work by running fdisk -l or you can limit the results to just the disks that are part of your RAID by listing them in brackets like this:

fdisk -l /dev/sd[ab]

or this if you have 5 disks

fdisk -l /dev/sd[abcde] | less

RAID with MultiDisk

Checking MultiDisk Status

Familiarize yourself with this command:

cat /proc/mdstat

This command is useful for watching what your MultiDisk RAID is doing RIGHT now. Here is an output that shows one RAID volume:

[root@gateway-utah ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda1[0] sdb1[1]
      120384 blocks [2/2] [UU]
      
unused devices:

Let's take this apart.

This RAID is RAID 1:
- Personalities : [raid1]
This RAID has one block device that is working:
- /dev/md1
This RAID is made up of two partitions:
- /dev/sda1
  - raid member [0]
- /dev/sdb1
  - raid member [1]
There are two disks in this array
- [2/2]
There are two working disks in this array
- [2/2] (a failed member would report this: [2/1])
Both drives are up
- [UU] (failed members will look like underscores: [U_])

Another useful command is to watch this file as it will display the status. You can do this especially when you are rebuilding to see the progress bar:

watch cat /proc/mdstat

Assembling your disks

Multidisk arrays are usually assembled by the /etc/mdadm.conf file. However, you are likely in this section because your RAID is not assembling…and how can it if mdadm.conf does not exist. Moreover, you CANNOT assemble disks in the rescue CD using a typical command like:

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
# ^^^^^^^^^^^^^^^ This won't work in Rescue Mode ^^^^^^^^^^^^^^^ #

Ok, so how do we assemble our disks? First, let's check our disk members (do this on all partitions which comprise your RAID):

mdadm --examine /dev/sda1

You should get results like this:

/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 03e965cf:42e2070c:eeb11af9:065b0b59
  Creation Time : Wed Aug  4 11:56:07 2010
     Raid Level : raid1
  Used Dev Size : 120384 (117.58 MiB 123.27 MB)
     Array Size : 120384 (117.58 MiB 123.27 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1

    Update Time : Thu Aug 30 10:40:57 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 819e9b9b - correct
         Events : 27850


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed

Check and make sure that the State is clean. If it is not clean you may have difficulties reassembling your array.

Now let's probe our disks and see what arrays we can find. Start by making a file in /etc/ called /etc/mdadm.conf. In it you will tell it which devices to scan:

DEVICE /dev/sd[abcd]1
DEVICE /dev/sd[abcd]2
DEVICE /dev/sd[abcd]3

In the above file, we will be scanning the first three partitions on four different drives for multidisk signatures. You will need to customize the above to suit your needs. Now, let's see what is there:

mdadm --examine --scan

This information is vital to assembling your array. If the output looks good, append this to your new /etc/mdadm.conf:

mdadm --examine --scan >> /etc/mdadm.conf

From here you can assemble your devices by name:

mdadm --assemble --scan /dev/md0
mdadm --assemble --scan /dev/md1

Now check to see your assembled RAID arrays:

cat /proc/mdstat

If this method does not work, you may have to try other means. Another way to see what is on our disks is to do an exhaustive probe and manual means:

mdadm -QE --scan

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=03e965cf:42e2070c:eeb11af9:065b0b59
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c10a6566:11ce9088:da0e5da7:e1449030
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=6d83baec:d8c4f50b:3ccc3173:326118cf

If you are familiar with Multidisk technology, you will notice that the output is very similar to the contents of the mdadm.conf file. In rescue mode, this information is critical because you can ONLY assemble disks using the UUID numbers.

Let's assemble md1:

mdadm --assemble --uuid 03e965cf:42e2070c:eeb11af9:065b0b59 /dev/md1

Notice that we do not put in the /dev/sdX1 disks. This is because the assemble will use the UUID which should be the same on each member. You will notice that this UUID is present when we ran the 'mdadm –examine /dev/sda1' command.

Now check the status using 'cat /proc/mdstat'.

Once the device is assembled, you can add the partitions that you created on your replacement disks (/dev/sdb1 in my example here):

mdadm --manage /dev/md0 --add /dev/sdb1

Now check the status using 'cat /proc/mdstat' or with 'watch cat /proc/mdstat'.

A rebuild of the array will begin at the beginning if one of the disks enters a uncompleted state before the sync is complete. A reboot will cause the sync to restart the sync from the beginning.

GRUB issues

Why me?

This section needs to be written as ClearOS7 uses grub2 and not grub

Making changes to your drive

By default, drives are mounted in read-only mode. You can remount a Read Only drive in Read/Write mode with a command such as:

mount -o remount,rw /sysroot

Change /sysroot to whichever partition you want to remount.

Help

Links

Recoving RAID on Linux in Rescue Mode

Community Forums

ClearOS Portal

ClearVM Platform

ClearVM 2 Platform

Developers Documentation

Warning

Table of Contents