|
|
VxVM System Administrator's Guide
For information specific to volume recovery, refer to Chapter 4.
The following topics are covered in this appendix:
In order to maintain system availability, the data important to running and booting your system must be mirrored. Furthermore, it must be preserved in such a way that it can be used in case of failure.
The following are some suggestions on how to protect your system and data:
root
and swap
devices to volumes (rootvol
and swapvol
). You should then mirror the root disk so that an alternate root disk exists for booting purposes. By mirroring disks critical to booting, you ensure that no single disk failure will leave your system unbootable and unusable.
vxassist mirror
to create mirrors, it locates the mirrors such that the loss of one disk will not result in a loss of data. By default, vxassist
does not create mirrored volumes; you can edit the file /etc/default/vxassist
to set the default layout to mirrored. Refer to Chapter 4, "Volume Administration" for information on the vxassist
defaults file.
rootdg
disk group should therefore contain enough contiguous spare or free space to accommodate the volumes on the root disk (rootvol
and swapvol
volumes require contiguous disk space).
rootvol
, swapvol
, and usr
volumes cannot be Dirty Region Logging volumes.)
The boot process starts when the system is turned on or reset. The first thing that is run is the Basic Input/Output System (BIOS) initialization routine or some other ROM-based boot code. This routine serves several purposes:
A:
and B:
) and two hard drives (C:
and D:)
. This configuration is normally kept in non-volatile RAM (NVRAM) on the system, and is configurable through a system-specific interface (some BIOSs have on-board configuration capabilities; others require a system floppy to reconfigure the system). The four disk devices are the disks that are available for use through the BIOS interface - in most cases, these are the only devices available during the early stages of the boot process, until UNIX is actually loaded and running. (Some disk controllers allow access to more than two hard drives during the early stages of booting.)
Once the default configuration is loaded and checked by the system BIOS, the BIOS initialization routine checks to see if any peripherals on the bus have their own BIOS and initialization routines, and if so, it runs them. This allows attached hardware to initialize itself and change the default configuration, if necessary. Many disk controllers have their own BIOS routines and will change the default configuration so that the C:
and D:
drive entries in the system configuration point to drives that are attached to that controller.
Once all peripheral BIOS routines have run, the system will attempt to boot an operating system from one of the disk devices. It first checks the A:
floppy drive to see if a floppy is inserted. If drive A:
does not contain a floppy, the BIOS attempts to read and execute a program called fdisk boot
from the first block of the drive designated as C:
. This program then goes on to read and execute another program from the disk's active partition (which is the UNIX partition) called the UNIX boot
program. This program then prepares for the loading of the UNIX operating system, loads UNIX from the /stand
file system on the disk, and starts up UNIX. The boot
program also passes UNIX some information about the system configuration that will be used during the UNIX part of the boot process.
When UNIX is started, it examines the system and the arguments passed from boot
and does its own initialization. It eventually mounts the root file system, sets up the initial swap area, and executes the init
program (located in /sbin/init
) to bring the system up. init
runs the VxVM startup routines that load the complete Volume Manager configuration, check the root file system, etc.
C:
and D:
disk locations during the BIOS initialization process. The exact actions taken depend entirely on the controller involved. Some controllers are very simple and just map the first two disks found into C:
and D:
; more advanced controllers are capable of being configured to use specific devices or to check for failed disks, and if possible, substitute others. The basic function, however, is to point the entry for disk C:
in the system BIOS configuration at the disk that should be used for the rest of the boot process (that is, where to find fdisk
boot
, the UNIX boot
program, and the UNIX operating system itself). If no disk is configured as C:
, or if that disk does not have the necessary contents for booting UNIX, the boot will fail. C:
and the disk with a SCSI ID of 1 into D:
. Other controllers give the administrator other configuration options and features. While these vary greatly between controllers, it is possible to classify controller features into three groups. These groups are:
C:
and SCSI disk 1 into D:
. Some simple controllers (such as the Adaptec 1540/1542 B) will not map SCSI disk 1 into D:
if no SCSI disk 0 responds on the bus. This can be inconvenient, since a failure of SCSI disk 0 will require that the administrator reconfigure the disks on the controller in order to boot the system.
C:
and D:
.
C:
and D:
, they perform a validation of those disks (such as making sure the specified disk is still attached to the controller and is powered on). If a specified disk fails the validation, another disk on the controller is chosen and mapped into the configuration at the location of the failed disk.
C:
and the next lowest SCSI ID into D:
. It should be noted that some auto-failover controllers can, at the time of installation, be configured to perform in the simple controller mode. It is essential that the administrator has full knowledge of the capabilities of the controller. Controller-specific information may be found in most user manuals available from the manufacturer.
C:
. This disk is usually referred to as the boot
disk, since this is the disk that will be used for the early stages of the boot process. The system will not boot if:
C:
for the system BIOS to locate
fdisk boot
program)
fdisk
partition on the disk)
C:
by the controller. (For example, for the Adaptec 1540/1542 B controller, the administrator would be forced to replace the failed SCSI disk 0 with a properly configured disk and change its SCSI ID to 0.) By mirroring the system's boot-critical data to another disk with VxVM, that backup disk can be mapped into C:
in case of a primary boot disk failure and can be used to bring up the system.
Unfortunately, rearranging the disks so that the backup boot disk is mapped into C:
can mean disconnecting and reconnecting drives, moving jumpers to reorder the disks, etc., which is inconvenient. With some disk controllers, disks other than C:
are available for use during the boot process. Even with auto-failover controllers, the system may be unbootable because of a failure later in the boot process (such as an invalid UNIX partition table) that the controller cannot detect.
To avoid having to rearrange the hardware, VxVM supports a special boot floppy that can usually make the system boot from an alternate drive without having to rearrange hardware.
C:
for use as the boot disk. This can be very convenient when the disk that is mapped into C:
has failed completely or contains stale or corrupt data. A:
, the system BIOS will read the fdisk boot
and boot
programs from the floppy, circumventing data problems on the disk mapped into C:
. The boot
program on this floppy is slightly different than the normal hard-drive boot
program. Once the system has initialized and the floppy boot
program is running, the following prompt will appear on the screen:
Enter kernel name [C:unix]:
The kernel name specified can have two parts: a disk name and a file name. The default (as shown) is to boot unix
from the disk mapped into C:
(C:unix
). You can select an alternate by specifying the disk and/or the kernel by entering the disk identifier and/or the operating system name.
unix
; booting a different kernel can have negative effects on your system.
unix
, you can simply enter the disk to be used as the boot disk. For example, entering D:
and pressing Return will use the drive mapped into D:
(if any exists) as the boot disk. vxmkboot
utility is used to create VxVM boot floppies. To do this, place a formatted floppy in the first floppy drive on the system. (See the manual pages for formatting a floppy and floppy devices.) The boot image is placed on the floppy by issuing the following command at a shell prompt:
/etc/vx/bin/vxmkboot
If successful, the command will display the following:
xx+0 records in xx+0 records out
where xx is a number indicating the size of the boot program. If a failure occurs, an error message describing the error will be printed and a message will be displayed indicating that the VxVM boot floppy creation failed.
After VxVM is installed, you should create several boot floppies and keep these in a safe place.
vxdiskadm
).
C:
If the controller decides the normal boot disk has failed, the system will automatically use a backup disk and therefore reboot without manual intervention in the case of boot disk failures that are detected by the controller. This section provides suggestions for system configurations for the three types of controllers described previously.
If disk 0 does not respond on the bus, the controller will not map any disk into C:
or D:
, making the system totally unbootable, even with the VxVM boot floppy. It always attempts to map SCSI disk 0 into C:
and SCSI disk 1 into D:
, and these are the only disks available during the system boot. Therefore, the best possible configuration is to use vxrootmir
to mirror the boot disk to the disk with SCSI ID 1. This allows you to boot using the VxVM boot floppy in the case of data failure on the boot disk. Adaptec 1540/1542 B and WD7000 are examples of simple controllers.
C:
and D:
drives. Hence, a suggested configuration could be to use vxrootmir
to mirror the boot disk onto another available disk, and in case of a failure on the disk mapped to C:
, remap the mirrored boot disk to C:
and reboot the system. Such remapping typically requires a system floppy provided by the manufacturer of the controller. An example of a configurable controller is Adaptec 1542 C. C:
and D:
(for example, if controller 1 disk 0 fails, it will map controller 1 disk 1 into C:
and nothing from the second controller into D:
). The best choice for configuring a system in this case is to mirror the boot disk (SCSI ID 0 on the first controller) to SCSI ID 1 on the first controller. If multiple controllers are available, it is also possible to mirror the boot disk to SCSI ID 0 on the second controller. This gives you the ability to have the system failover automatically, even if all disks on the first controller become unavailable (for reasons such as cable/terminator failure or an electronic failure on the controller itself). The above applies only in the case where all the controllers attached to the system are auto-failover controllers. Examples of auto-failover controllers are Adaptec 1742/1744 and DPT 2012B controllers.
C:
, if such a disk exists. Among other disks, the disk with the lowest ID is mapped to D:
. In the absence of a disk with ID 6, the controller maps the disk with the highest ID to C:
, and the disk with the next highest ID to D:
. It should be noted that if a disk with ID 6 is present and fails due to an electronic or media failure error, the controller will not auto-failover to D:
. The VxVM boot floppy should be used in order to boot from the desired disk.
If the controller fails to map any disks when the regular boot disk fails, rearrange the physical disk devices so that the alternate boot disk is mapped into C:
. If the controller has auto-failover capabilities, the system may manage to boot itself despite the errors. You will usually find out about the failure via mail received from the Volume Manager when it notices the failure.
If the controller does not have auto-failover capabilities or if the failure was not detectable by the controller, the drive being mapped into C: by the controller is incapable of booting the system. The easiest way to boot the system in this situation is to boot using the VxVM boot floppy to specify an alternate boot disk besides the one mapped into C:.
To boot with the VxVM boot floppy, place the floppy in floppy drive A: and power up the machine. After the system initialization, you should see the following on the screen:
Booting... Entering BOOT interactive session... [? for help] [boot]#
You should now enter the keyword DISK=
followed by the
letter corresponding to the alternate boot disk (see the
boot
(4) manual page for more information on boot
keywords). The letter will depend on your configuration, as well as any
auto-failover procedures taken by the controller. For example, with a
simple controller, the system has probably been configured so that the
disk mapped into D: is the alternate disk. In this case, you would
enter DISK=D:
at the first [boot]#
prompt and
go
at the next [boot]#
prompt to boot the
system from the alternate disk.
Note that auto-failover controllers can confuse the drive mappings. For example, consider a three-disk system that has a simple auto-failover controller which maps the two disks with the lowest SCSI IDs into C: and D:. Normally, the disk with SCSI ID 0 is mapped into C: and the disk with SCSI ID 1 is mapped into D:. If the first has failed completely, the controller maps the disk with SCSI ID 1 into C: and the disk with SCSI ID 2 into D:. If the system still fails to boot off C: (SCSI disk 1) and the boot disk is also mirrored to SCSI disk 2, you would specify DISK= D:
to boot off the third disk on the system. If the disk specified to the VxVM boot floppy is not a valid boot disk, the boot program will print an error. For example, if you specify a disk that does not exist, the screen will show:
get_hdfs: Can't get hard disk driver parameters
In this case, you should recheck the drive configuration and specify a
different disk at the [boot]#
prompt.
Most controllers that auto-failover will output diagnostics describing the kind of failure and the mapping being done. For example, the Adaptec 1740 family of controllers provides output as shown below if the SCSI disk 0 fails to respond on the bus during the controller BIOS initialization:
Adaptec AHA-1740 BIOS vX.XX Copyright 1992, Adaptec Inc. [ Standard Mode ] Target 0 - Device Not Found Target 1 - Drive C: (80h)
The screen clears soon after this message appears.
NO ROM BASIC SYSTEM HALTED
This means that the system BIOS was unable to read the fdisk
boot
program from the boot drive. This can occur if no disk was mapped into C:
by the controller, if the SCSI bus has locked up, or if the drive mapped into C:
has no fdisk
boot
program on it.
Common causes for this problem are:
C:
is not powered on.
If no hardware problems are found, the error is probably due to data errors on the disk mapped into C:
. In order to repair this problem, attempt to boot the system from an alternate boot disk. If your controller allows you to use the boot floppy and you are unable to boot from an alternate boot disk, there is still some type of hardware problem. Similarly, if swapping the failed boot disk with an alternate boot disk fails to allow the system to boot, this also indicates a hardware problem.
fdisk
partition on a disk determines the disk partition from which the boot
program should be read. (See the hd
(7) and fdisk
(1M) manual pages for more information on disk partitioning and fdisk
.)
Normally, the boot disk will have one UNIX partition that is marked as active. If the fdisk boot
program cannot find an active partition to boot from, it will display the following message:
Invalid Partition Table
The most likely reasons for this problem are:
fdisk
program was used to mark the UNIX partition as no longer active.
fdisk
to look at the fdisk
partitions. If the UNIX partition is not marked active, mark it active and save the changes. After you have marked the UNIX partition as active, try rebooting the system from the disk. If there is no UNIX partition, you must re-add the disk. Refer to "Re-adding a Failed Boot Disk" for details.
boot
program fails to load or start execution properly, the system will display:
Missing operating system
This can occur if:
boot
program on disk
boot
program was accidentally corrupted due to operator error
boot
program on the disk was corrupted by a transient disk error, or was perhaps accidentally overwritten. If this is the case, an attempt can be made to rewrite the boot
program to disk using the command:
/etc/vx/bin/vxbootsetup disk01
If this command fails, or if the console shows errors writing to the device, the disk should be replaced as described in "Replacing a Failed Boot Disk." If this command completes, but you continue to have problems with the drive, consider replacing it anyway.
boot
program has loaded, it will attempt to access the boot disk through the normal UNIX partition information. If this information is damaged, the boot program will fail with the following error:
boot: No file system to boot from
If this message appears during the boot, the system should be booted from an alternate boot disk. While booting, most disk drivers will display errors on the console about the invalid UNIX partition information on the failing disk. The messages will look similar to this:
WARNING: Disk Driver: HA 0 TC 0 UNIX 0, Invalid disk VTOC
This indicates that the failure was due to an invalid disk partition. You can attempt to re-add the disk as described in "Re-adding a Failed Boot Disk." However, if the reattach fails, the disk will need to be replaced as described in "Replacing a Failed Boot Disk"."
boot
program has found a valid UNIX partition table, it will attempt to read and execute several files in the /stand
file system. If it has any problem finding these files, it will display a message similar to:
boot: Cannot load file: file not found
where file can be one of /etc/initprog/sip
, /etc/initprog/mip
, or unix
.
The possible causes for these failures are:
/stand
file system does not exist.
/stand.
/stand
file system), the system is unbootable and irrecoverable; the system will need to be reinstalled. See "Reinstallation Recovery."
If the failure is due to data errors in the /stand
file system, the system can be booted from an alternate boot disk to investigate the problem. When the system boots, VxVM will notice errors on the stand volume and detach the mirror of the stand volume that resides on the failing boot disk. If the errors are correctable, VxVM will attempt to correct them, and if successful, the disk can continue to be used; otherwise, the disk should be replaced as described in "Replacing a Failed Boot Disk." To determine if the errors were corrected, print information about the stand volume by issuing the command:
vxprint -tph -e 'assoc=="standvol"'
For example, if the failing boot disk is named disk01
and the alternate disk is named disk02
, the output from vxprint
should resemble the following:
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH COL/]OFF DEVICE MODE pl standvol-01 standvol DETACHED STALE 32768 CONCAT - RW sd disk01-01 standvol-01 0 0 32768 0 c0b0t0d0 RW pl standvol-02 standvol ENABLED ACTIVE 32768 CONCAT - RW sd disk02-03 standvol-02 0 0 32768 disk01 c0b0t1d0 RWNote that the first mirror,
standvol-01
, has a kernel state (KSTATE
) of DETACHED
and a state of STALE
. This is because the failures on the disk on which it resides, disk01
, caused VxVM to remove the plex from active use in the volume. You can attempt to once again correct the errors by resynchronizing the failed mirror with the volume by issuing the command:
vxrecover standvol
If the Volume Manager fails to correct the error, the disk driver will print notices of the disk failure to the system console and the vxrecover
utility will print an error message. If this occurs, the disk has persistent failures and should be replaced.
If the failure is due to the lack of a /stand
slice, the system will boot normally and the vxprint
command above will show all plexes with a KSTATE
of ENABLED
and a state of ACTIVE
. The vxbootsetup
utility can be used to attempt to fix this. For example, if the failed disk is disk01
, then the command:
/etc/vx/bin/vxbootsetup disk01
will attempt to correct the partitioning problems on the disk. If this command fails, the disk will need to be replaced as described in "Replacing a Failed Boot Disk."
WARNING: vxvm: Can't open disk ROOTDISK in group ROOTDG.
If it is removable media (such as a floppy), it may not be mounted or ready. Otherwise, there may be problems with the drive.
Kernel error code 19 WARNING: root.c: failed to open disk ROOTDISK, error 19. WARNING: root.c: failed to set up the root disk, error 19. PANIC: vfs_mountroot: cannot mount root
If this problem (or the corresponding problem involving the swap area) occurs, boot the system from an alternate boot disk and use the vxbootsetup
utility (as described above) to attempt to recreate the needed partitions. If this command fails, the failed disk will need to be replaced as described in "Replacing a Failed Boot Disk."
Another possible problem can occur if errors in the VxVM headers on the boot disk prevent VxVM from properly identifying the disk. In this case, VxVM will not be able to determine the name of that disk. This is a problem because mirrors are associated with disk names, and therefore, any mirrors on that disk are unusable.
If either of these situations occurs, the VxVM utility vxconfigd
will notice the problem when it is configuring the system as part of the init
processing of the boot sequence. vxconfigd
will display a message describing the error, describe what can be done about it, and halt the system.
vxvm:vxconfigd: ERROR: enable failed: Error in disk group configuration copies No valid disk found containing disk group; transactions are disabled. vxvm:vxconfigd: FATAL ERROR: Rootdg cannot be imported during boot Errors were encountered in starting the root disk group, as a result the Volume Manager is unable to configure the root volume, which contains your root file system. The system will be halted because the Volume Manager cannot continue. You will need to do one of the following: a) Boot to a floppy and fix your /dev/rroot to no longer be a volume, and then boot a kernel that was not configured to use a volume for the root file system. b) Re-install your system from the original operating system package.
Once the system has booted, the exact problem needs to be determined. If the mirrors on the boot disk were simply stale, they will be caught up automatically as the system comes up. If, on the other hand, there was a problem with the private area on the disk, you will need to re-add or replace the disk.
If the mirrors on the boot disk were unavailable, you should get mail from the VxVM utilities describing the problem. Another way to discover the problem is by listing the disks with the vxdisk
utility. For example, if the problem is a failure in the private area of disk01
(such as due to media failures or accidentally overwriting the VxVM private region on the disk), the command vxdisk
list
might show the following output:
DEVICE TYPE DISK GROUP STATUS - - disk01 rootdg failed was: c0b0t0d0s7
vxbootsetup
utility, which configures the disk with the new mirror as a bootable disk.
Hot-relocation may fail for a root disk if the rootdg
disk group does not contain sufficient spare or free space to accommodate the volumes from the failed root disk. The rootvol
and swapvol
volumes require contiguous disk space. If the root volume and other volumes on the failed root disk cannot be relocated to the same new disk, each of these volumes can be relocated to a different disk. Mirrors of rootvol
and swapvol
volumes must be cylinder-aligned, so they can only be created on disks with enough space to allow their subdisks to begin and end on cylinder boundaries; hot-relocation will fail if such disks are not available.
vxdiskadm
). Data that is not critical for booting the system is only accessed by the Volume Manager after the system is fully operational, so it doesn't matter where that data is located -- the Volume Manager can find it. However, boot-critical data must be placed in specific areas on the bootable disks in order for the boot process to find it. The controller-specific configuration actions performed by the disk controller involved in the process and the system BIOS constrain the location of this data. Therefore, the process of replacing a boot disk is slightly more complex. When a disk fails, there are two possible routes that can be taken to correct the problem:
disk01
and disk02
, which are normally mapped into the system configuration during boot as disks C:
and D:
, respectively. A failure has caused disk01
to become detached. This can be confirmed by listing the disks with the vxdisk
utility, as in:
vxdisk list
This would result in the following output:
DEVICE TYPE DISK GROUP STATUS c0b0t0d0s7 sliced - - error c0b0t1d0s7 sliced disk02 rootdg online - - disk01 rootdg failed was:c0b0t0d0s7
Notice that the disk disk01
has no device associated with it, and has a status of failed
with an indication of the device that it was detached from. It is also possible that the device c0b0t0d0s7
would not be listed at all; this would occur if the disk failed totally and the disk controller did not notice it on the bus.
In some cases, the vxdisk
list
output may differ. For example, if the boot disk has uncorrectable failures associated with the UNIX partition table (such as a missing root partition that cannot be corrected), but no errors in the VxVM private area, the output of the vxdisk
list
command resembles the following:
DEVICE TYPE DISK GROUP STATUS c0b0t0d0s7 sliced disk01 rootdg online c0b0t1d0s7 sliced disk02 rootdg online
However, because the error was not correctable by the described procedures, the disk is still deemed to have failed. In this case, it is necessary to detach the failing disk from its device manually. This is done using the "Remove a disk for replacement" function of the vxdiskadm
utility (see the vxdiskadm
(1M) manual page or the VERITAS Volume Manager User's Guide for more information about vxdiskadm
). Once the disk is detached from the device, any special procedures for correcting the problem can be followed (such as reformatting the device).
To re-add the disk, use the "Replace a failed or removed disk" function of the vxdiskadm
utility to replace the disk, and select the same device as the replacement. In the above examples, this would mean replacing disk01
with the device c0b0t0d0s7
.
If hot-relocation is enabled when a mirrored boot disk fails, it will attempt to create a new mirror and remove the failed subdisks from the failing boot disk. If a re-add succeeds after a successful hot-relocation, the root and/or other volumes affected by the disk failure will no longer exist on the re-added disk. However, the re-added disk can still be used for other purposes.
If a re-add of the disk fails, the disk should be replaced.
vxdiskadm
(see the vxdiskadm
(1M) manual page or the VERITAS Volume Manager User's Guide for more information about vxdiskadm
). Once the disk is detached, the system should be shut down and the hardware replaced. The replacement disk should have at least as much storage capacity as was in use on the disk being replaced. The replacement disk should be large enough so that the region of the disk for storing subdisks can accommodate all subdisks of the original disk at their current disk offsets. To determine the minimum size of a replacement disk, you need to determine how much space was in use on the disk that failed.
To approximate the size of the replacement disk, use the command:
vxprint -st -e 'sd_disk="diskname"'
From the resulting output, add the values under the DISKOFFS
and LENGTH
columns for the last subdisk listed. The total is in 512-byte multiples. Divide the sum by 2 for the total in kilobytes.
vxdiskadm
's "Replace a failed or removed disk" function to replace the failing disk with the new device that was just added. vxreattach
command to reattach the disks without plexes being flagged as stale, as long as the reattach happens before any volumes on the disk are started.
The vxreattach
command is called as part of disk recovery from the vxdiskadm
menus and during the boot process. If possible, vxreattach
will reattach the failed disk media record to the disk with the same device name in the disk group in which it was located before and will retain its original disk media name. After a reattach takes place, recovery may or may not be necessary. The reattach may fail if the original (or another) cause for the disk failure still exists.
The command vxreattach
-c
checks whether a reattach is possible, but does not actually perform the operation. Instead, it displays the disk group and disk media name where the disk can be reattached.
Refer to the vxreattach
(1M) manual page for more information on the vxreattach
command.
If these types of failures occur, you should attempt to preserve as much of the original Volume Manager configuration as possible. Any volumes not directly involved in the failure may be saved. You do not have to reconfigure any volumes that are preserved.
This section describes the procedures used to reinstall VxVM and preserve as much of the original configuration as possible after a failure.
The system root disk is always involved in reinstallation. Other disks may also be involved. If the root disk was placed under Volume Manager control (either during Volume Manager installation or by later encapsulation), that disk and any volumes or mirrors on it are lost during reinstallation. In addition, any other disks that are involved in the reinstallation (or that are removed and replaced) may lose Volume Manager configuration data (including volumes and mirrors).
If a disk (including the root disk) is not under Volume Manager control prior to the failure, no Volume Manager configuration data is lost at reinstallation. Any other disks to be replaced can be replaced by following the procedures in the VERITAS Volume Manager User's Guide. Although it simplifies the recovery process after reinstallation, not having the root disk under Volume Manager control increases the likelihood of a reinstallation being necessary. By having the root disk under VxVM control and creating mirrors of the root disk contents, you can eliminate many of the problems that require system reinstallation.
When reinstallation is necessary, the only volumes saved are those that reside on, or have copies on, disks that are not directly involved with the failure and reinstallation. Any volumes on the root disk and other disks involved with the failure and/or reinstallation are lost during reinstallation. If backup copies of these volumes are available, the volumes can be restored after reinstallation. The exceptions are the root
, stand
, and usr
file systems, which cannot be restored from backup.
vxinstall
command.
rootvol
, swapvol
, etc.).
home
file system on the second disk, it may still be recoverable. Removing the second disk ensures that the home file system remains intact. While the operating system installation progresses, make sure no disks other than the root disk are accessed in any way. If anything is written on a disk other than the root disk, the Volume Manager configuration on that disk could be destroyed.
pkgadd
command to add the package from the CD-ROM.
vxinstall
) after the reinstallation.
shutdown -g0 -iS -y
rm -rf /etc/vx/reconfig.d/state.d/install-db
vxiod set 10
vxconfigd
, in disabled mode by entering the command:
vxconfigd -m disable
vxconfigd
daemon by entering:
vxdctl init
vxconfigd
by entering:
vxdctl enable
The configuration preserved on the disks not involved with the reinstallation has now been recovered. However, because the root disk has been reinstalled, it appears to the Volume Manager as a non VxVM disk. Therefore, the configuration of the preserved disks does not include the root disk as part of the VxVM configuration.
If the root disk of your system and any other disks involved in the reinstallation were not under Volume Manager control at the time of failure and reinstallation, then the reconfiguration is complete at this point. If any other disks containing volumes or mirrors are to be replaced, follow the replacement procedures in the VERITAS Volume Manager User's Guide. There are several methods available to replace a disk; choose the method that you prefer.
If the root disk (or another disk) was involved with the reinstallation, any volumes or mirrors on that disk (or other disks no longer attached to the system) are now inaccessible. If a volume had only one plex (contained on a disk that was reinstalled, removed, or replaced), then the data in that volume is lost and must be restored from backup. In addition, the system's root
file system, swap
area, and stand
area are not located on volumes any longer. To correct these problems, follow the instructions in "Configuration Cleanup."
The following types of cleanup are described:
rootvol
, which contains the root
file system
swapvol
, which contains the swap
area
standvol
, which contains the stand
file system
vxedit
command, as follows:
vxedit -fr rm rootvol
Repeat this command, using swapvol
and standvol
in place of rootvol
, to remove the swap
and stand
volumes.
To restore the volumes, do the following:
vxdisk list
vxdisk
list
command is similar to this:
DEVICE TYPE DISK GROUP STATUS c0b0t0d0s7 sliced - - error c0b0t1d0s7 sliced disk02 rootdg online c0b0t2d0s7 sliced disk03 rootdg online - - disk01 rootdg failed was: c0b0t0d0s7
c0b0t0d0,
is not associated with a VM disk and is marked with a status of error
. disk02
and disk03
were not involved in the reinstallation and are recognized by the Volume Manager and associated with their devices (c0b0t1d0s7
and c0b0t2d0s7
). The former disk01
, which was the VM disk associated with the replaced disk device, is no longer associated with the device (c0b0t0d0s7
).
error
state and a VM disk listed as not associated with a device.
vxprint -sF "%vname" -e'sd_disk = "disk"'
failed
status. Be sure to enclose the disk name in quotes in the command. Otherwise, the command will return an error message. The vxprint
command returns a list of volumes that have mirrors on the failed disk. Repeat this command for every disk with a failed
status.
vxprint -th volume_name
vxprint
command displays the status of the volume, its plexes, and the portions of disks that make up those plexes. For example, a volume named v01
with only one plex resides on the reinstalled disk named disk01
. The vxprint
-th
v01
command produces the following display:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v01 fsgen DISABLED ACTIVE 24000 SELECT - pl v01-01 v01 DISABLED NODEVICE 24000 CONCAT - RW sd disk01-06 v01-01 disk01 245759 24000 0 c1b0t5d1 ENA
pl
. The STATE
field for the plex named v01-01
is NODEVICE
. The plex has space on a disk that has been replaced, removed, or reinstalled. Therefore, the plex is no longer valid and must be removed.
v01-01
was the only plex of the volume, the volume contents are irrecoverable except by restoring the volume from a backup. The volume must also be removed. If a backup copy of the volume exists, you can restore the volume later. Keep a record of the volume name and its length, as you will need it for the backup procedure.
v01
, use the vxedit
command:
vxedit -r rm v01
v02
has one striped plex striped across three disks, one of which is the reinstalled disk disk01
. The output of the vxprint
-th
v02
command returns:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v02 fsgen DISABLED ACTIVE 10240 SELECT - pl v02-01 v02 DISABLED NODEVICE 10240 STRIPE - RW sd disk02-02 v02-01 disk02 424144 10240 0 c1b0t2d0 ENA sd disk01-05 v02-01 disk01 620544 10240 0 c1b0t2d1 DIS sd disk03-01 v02-01 disk03 620544 10240 0 c1b0t2d2 ENA
v02-01
is striped (the lines starting with sd
represent the stripes). One of the stripe areas is located on a failed disk. This disk is no longer valid, so the plex named v02-01
has a state of NODEVICE
. Since this is the only plex of the volume, the volume is invalid and must be removed. If a copy of v02
exists on the backup media, it can be restored later. Keep a record of the volume name and length of any volume you intend to restore from backup.
vxedit
command to remove the volume, as described earlier.
vxprint
-th
command for a volume with one plex on a failed disk (disk01
) and another plex on a valid disk (disk02
) would look like this:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v03 fsgen DISABLED ACTIVE 30720 SELECT - pl v03-01 v03 DISABLED ACTIVE 30720 STRIPE - RW sd disk02-01 v03-01 disk01 620544 10240 0 c1b0t3d0 ENA pl v03-02 v03 DISABLED NODEVICE 10240 CONCAT - RW sd disk01-04 v03-02 disk03 262144 10240 0 c1b0t2d2 ENA
v03-01
and v03-02
. The first plex (v03-01
) does not use any space on the invalid disk, so it can still be used. The second plex (v03-02
) uses space on invalid disk disk01
and has a state of NODEVICE
. Plex v03-02
must be removed. However, the volume still has one valid plex containing valid data. If the volume needs to be mirrored, another plex can be added later. Note the name of the volume if you wish to create another plex later.
vxplex
command. To remove the plex v03-02
, enter the following command:
vxplex -o rm dis v03-02
vxdisk list
command) must be removed from the configuration.
To remove the disk, use the vxdg
command. To remove the failed disk disk01
, enter:
vxdg rmdisk disk01
If the vxdg
command returns an error message, some invalid mirrors exist. Repeat the processes described in the section "Volume Cleanup" until all invalid volumes and mirrors are removed.
To add the root disk to Volume Manager control, use the Volume Manager Support Operations (vxdiskadm
). Enter:
vxdiskadmFrom the
vxdiskadm
main menu, select menu item 2
(Encapsulate
a
disk
). Follow the instructions and encapsulate the root disk for the system. For more information, see the VERITAS Volume Manager User's Guide. When the encapsulation is complete, reboot the system to multi-user mode.
vxdiskadm
. If the disks were reinstalled during the operating system reinstallation, they should be encapsulated; otherwise, they can simply be added.
Once all the disks have been added to the system, any volumes that were completely removed as part of the configuration cleanup can be recreated and their contents restored from backup. The volume recreation can be done using vxassist
.
or the Visual Administrator interface.
To recreate the volumes v01
and v02
using the vxassist
command, enter:
vxassist make v01 24000 vxassist make v02 30720 layout=stripe nstripe=3
Once the volumes are created, they can be restored from backup using normal backup/restore procedures.
Any volumes that had plexes removed as part of the volume cleanup can have these mirrors recreated by following the instructions for mirroring a volume (via vxassist
)
or the Visual Administrator),
as described in the VERITAS Volume Manager User's Guide.
To replace the plex removed from volume v03
using vxassist
, enter:
vxassist mirror v03Once you have restored the volumes and plexes lost during reinstallation, the recovery is complete and your system should be configured as it was prior to the failure.
vxmake
plex
command; a plex created with this command can later be attached to a volume if required.
Plexes that are associated with a volume have one of the following states:
A Dirty Region Logging or RAID-5 log plex is a special case, as its state is always set to
LOG
. EMPTY
state to indicate that the plex is not yet initialized. CLEAN
state when it is known to contain a consistent copy (mirror) of the volume contents and an operation has disabled the volume. As a result, when all plexes of a volume are clean, no action is required to guarantee that the plexes are identical when that volume is started. ACTIVE
state in two situations:
ACTIVE
at the moment of the crash
ACTIVE
are made identical.
ACTIVE
should be the most common state you see for any volume's plexes.
STALE
state. Also, if an I/O error occurs on a plex, the kernel stops using and updating the contents of that plex, and an operation sets the state of the plex to STALE
.
A vxplex
att
operation recovers the contents of a STALE
plex from an ACTIVE
plex. Atomic copy operations copy the contents of the volume to the STALE
plexes. The system administrator can force a plex to the STALE
state with a vxplex
det
operation.
vxmend
off
operation indefinitely detaches a plex from a volume by setting the plex state to OFFLINE
. Although the detached plex maintains its association with the volume, changes to the volume do not update the OFFLINE
plex until the plex is put online and reattached with the vxplex
att
operation. When this occurs, the plex is placed in the STALE
state, which causes its contents to be recovered at the next vxvol
start
operation. TEMP
state facilitates some plex operations that cannot occur in a truly atomic fashion. For example, attaching a plex to an enabled volume requires copying volume contents to the plex before it can be considered fully attached.
A utility will set the plex state to TEMP
at the start of such an operation and to an appropriate state at the end of the operation. If the system goes down for any reason, a TEMP
plex state indicates that the operation is incomplete; a subsequent vxvol
start
will dissociate plexes in the TEMP
state.
TEMPRM
plex state resembles a TEMP
state except that at the completion of the operation, the TEMPRM
plex is removed. Some subdisk operations require a temporary plex. Associating a subdisk with a plex, for example, requires updating the subdisk with the volume contents before actually associating the subdisk. This update requires associating the subdisk with a temporary plex, marked TEMPRM
, until the operation completes and removes the TEMPRM
plex.
If the system goes down for any reason, the TEMPRM
state indicates that the operation did not complete successfully. A subsequent operation will dissociate and remove TEMPRM
plexes.
TEMPRMSD
plex state is used by vxassist
when attaching new plexes. If the operation does not complete, the plex and its subdisks are removed. IOFAIL
plex state is associated with persistent state logging. On the detection of a failure of an ACTIVE
plex, vxconfigd
places that plex in the IOFAIL
state so that it is disqualified from the recovery selection process at volume start time. vxvol
start
operation makes all CLEAN
plexes ACTIVE
. If all goes well until shutdown, the volume-stopping operation marks all ACTIVE
plexes CLEAN
and the cycle continues. Having all plexes CLEAN
at startup (before vxvol
start
makes them ACTIVE
) indicates a normal shutdown and optimizes startup. DISABLED
), maintenance (DETACHED
), or online (ENABLED
) mode of operation.The following are plex kernel states:
DISABLED
-- The plex may not be accessed.
DETACHED
-- A write to the volume is not reflected to the plex. A read request from the volume will never be satisfied from the plex. Plex operations and ioctl functions are accepted.
ENABLED
-- A write request to the volume will be reflected to the plex. A read request from the volume will be satisfied from the plex.
CLEAN
-- The volume is not started (kernel state is DISABLED
) and its plexes are synchronized.
ACTIVE
-- The volume has been started (kernel state is currently ENABLED
) or was in use (kernel state was ENABLED
) when the machine was rebooted. If the volume is currently ENABLED
, the state of its plexes at any moment is not certain (since the volume is in use). If the volume is currently DISABLED
, this means that the plexes cannot be guaranteed to be consistent, but will be made consistent when the volume is started.
EMPTY
-- The volume contents are not initialized. The kernel state is always DISABLED
when the volume is EMPTY
.
SYNC
-- The volume is either in read-writeback recovery mode (kernel state is currently ENABLED
) or was in read-writeback mode when the machine was rebooted (kernel state is DISABLED
). With read-writeback recovery, plex consistency is recovered by reading data from blocks of one plex and writing the data to all other writable plexes. If the volume is ENABLED
, this means that the plexes are being resynchronized via the read-writeback recovery. If the volume is DISABLED
, it means that the plexes were being resynchronized via read-writeback when the machine rebooted and therefore still need to be synchronized.
NEEDSYNC
-- The volume will require a resynchronization operation the next time it is started.
DIRTY/CLEAN
flag). If the clean flag is set, this means that an ACTIVE
volume was not written to by any processes or was not even open at the time of the reboot; therefore, it can be considered CLEAN
. The clean flag will always be set in any case where the volume is marked CLEAN
.
CLEAN
-- The volume is not started (kernel state is DISABLED
) and its parity is good. The RAID-5 plex stripes are consistent.
ACTIVE
-- The volume has been started (kernel state is currently ENABLED
) or was in use (kernel state was ENABLED
) when the machine was rebooted. If the volume is currently ENABLED
, the state of its RAID-5 plex at any moment is not certain (since the volume is in use). If the volume is currently DISABLED
, this means that the parity cannot be guaranteed to be synchronized.
EMPTY
-- The volume contents are not initialized. The kernel state is always DISABLED
when the volume is EMPTY
.
SYNC
-- The volume is either undergoing a parity resynchronization (kernel state is currently ENABLED
) or was having its parity resynchronized when the machine was rebooted (kernel state is DISABLED
).
NEEDSYNC
-- The volume will require a parity resynchronization operation the next time it is started.
REPLAY
-- The volume is in a transient state as part of a log replay. A log replay occurs when it becomes necessary to use logged parity and data.
DISABLED
), maintenance (DETACHED
), or online (ENABLED
) mode of operation.The following are volume kernel states:
DISABLED
-- The volume cannot be accessed.
DETACHED
-- The volume cannot be read or written, but plex device operations and ioctl functions are accepted.
ENABLED
-- The volumes can be read and written.