DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
Managing the VxFS filesystem

VxFS disk layout

Three disk layouts are available with VxFS:


Version 1
The Version 1 disk layout is the original VxFS disk layout provided with pre-2.0 versions of Veritas filesystems.

Version 2
The Version 2 disk layout is a newer and more complex layout, supporting features such as filesets, dynamic inode allocation, and enhanced security (including support for ACLs).

Version 4
In addition to Version 2 features, Version 4 supports large files, allows variable extent sizes in a file, and supports quotas.
All three disk layout versions are currently supported by VxFS. Although new filesystems are created with the Version 4 layout by default, mkfs(ADM) allows the user to specify the Version 1 or Version 2 layout instead (using the -o version=n option).

The vxupgrade command is provided to upgrade existing VxFS filesystems to the Version 4 layout. (Refer to vxupgrade(ADM) for details.)

Although the Version 4 layout is more complex, it shares many of the Version 1 and Version 2 features and characteristics. Since the Version 1 layout is simpler and easier to understand, familiarize yourself with the Version 1 layout before attempting to understand the more complex Version 2 and Version 4 layouts.

The VxFS version 1 disk layout

The VxFS Version 1 disk layout is composed of

Superblock

The superblock contains important information about the filesystem, such as:

The superblock is always in a fixed location, offset from the start of the filesystem by 1024 bytes. The superblock is 1024 bytes long.

Copies of the superblock are kept in allocation unit headers (see ``Allocation unit''); these copies can be used for recovery purposes if the superblock is corrupted or destroyed.

Intent log

In the event of system failure, the VxFS filesystem uses intent logging to guarantee filesystem integrity.

The intent log is a circular activity log with a default size of 512 blocks. If the filesystem is less than 4MB, the log size is reduced to avoid wasting space. This log contains records of the intention of the system to update a filesystem structure. An update to the filesystem structure (a transaction) is divided into separate subfunctions for each data structure that needs to be updated. A composite log record of the transaction is created that contains the subfunctions that constitute the transaction.

For example, the creation of a file that would expand the directory in which the file is contained produces a transaction consisting of the following subfunctions:

VxFS maintains log records in the intent log for all pending changes to the filesystem structure and ensures that the log records are written to disk in advance of the changes to the filesystem. Once the intent log has been written, the transaction's other updates to the filesystem can be written in any order. In the event of a system failure, the pending changes to the filesystem are either nullified or completed by the fsck utility. The VxFS intent log generally only records changes to the filesystem structure. File data changes are not normally logged.

Allocation unit

An ``allocation unit'' is a group of consecutive blocks in a filesystem that contains resource summaries, free resource maps, inodes, and data blocks. Each component of an allocation unit begins on a block boundary. The VxFS Version 1 allocation unit structure is as follows:

One or more allocation units exist per filesystem. Allocation units are located immediately after the intent log. The number and size of allocation units can be specified when the filesystem is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.

Allocation unit header

The allocation unit header contains a copy of the filesystem's superblock that is used to verify that the allocation unit matches the superblock of the filesystem. The superblock copies contained in allocation unit headers can also be used for recovery purposes if the superblock is corrupted or destroyed. The allocation unit header occupies the first block of each allocation unit.

Allocation unit summary

The allocation unit summary contains the number of inodes with extended operations pending, number of free inodes, and number of free extents in the allocation unit.

Free inode map

The free inode map is a bitmap that indicates which inodes are free and which are allocated. A free inode is indicated by the bit being on. Inodes zero and one are reserved by the filesystem; inode two is the inode for the root directory; inode three is the inode for the lost+found directory.

Extended inode operations map

The extended inode operations map keeps track of inodes on which operations would remain pending for too long to reside in the intent log. The intent log must complete faster than it wraps, so lengthy operations are posted directly in the inode. The extended inode operations map is in the same format as the free inode map. This map is updated to identify the inodes that have extended operations that need to be completed. This map allows the fsck utility to quickly identify which inodes had extended operations pending at the time of a system failure.

Free extent map

The free extent map is a series of independent 512-byte bitmaps that are each referred to as a free extent map section. The first region of 2048 bits represents a section of 2048 one-block extents. The second region of 1024 bits represent a section of 1024 two-block extents. This sectioning continues for all powers of 2 up to the single bit that represents one 2048-block extent.

The one-block bitmaps always represent the true allocation of blocks from the allocation unit. The remaining bitmaps remap these same blocks, in a ``binary-buddy'' scheme, in increasingly larger-sized groups. As smaller extents are needed, the larger groups of blocks mapped by the buddy maps are broken apart to create the smaller extents.

Inodes

An inode is a data structure that contains information about a file. The VxFS default inode size is currently 256 bytes.

Each inode stores information, such as the following, about a particular file:

There are up to ten direct extent address-size pairs per inode. Each direct extent address indicates the starting block number of a direct extent; direct extent sizes can vary.

If all of the direct extents are used, two indirect address extents are available for use in each node.

Each indirect address extent is 8KB long and contains 2048 entries. All indirect data extents for a given file have the same size.

The inode list is a series of inodes. There is one inode in the list for every file.

Padding

It might be desirable to align data blocks to a physical boundary. To facilitate this, the system administrator can specify that a gap be left between the end of the inode list and the first data block.

VxFS data blocks

The balance of the allocation unit is occupied by data blocks. Data blocks contain the actual data stored in files and directories.

The VxFS version 2 disk layout

Many aspects of the Version 1 disk layout are preserved in the Version 2 disk layout. However, the Version 2 layout differs from the Version 1 layout in that it includes support for the following:

The addition of features such as filesets and dynamic allocation of inodes has affected the disk layout in various ways. In particular, many of the filesystem structures are now located in files (referred to as ``structural files'') rather than in fixed disk areas. This provides a simple mechanism for dynamic growth of structures. For example, inodes are now stored in structural files and allocated as needed. In general, filesystem structures that deal with space allocation are still in fixed disk locations, while most other structures are dynamically allocated and have become clients of the filesystem's disk space allocation scheme.

Because many disk layout characteristics are shared by both the Version 1 and Version 2 disk layouts, you should have a general understanding of the Version 1 layout. Structures that are common to both disk layouts are described in detail in ``The VxFS version 1 disk layout'' and are only mentioned briefly here.

The relatively complex nature of the Version 2 layout is covered in the following general areas:

VxFS version 2 basic layout

This section describes the structural elements of the filesystem that exist in fixed locations on the disk.

The VxFS Version 2 disk layout is composed of:

The superblock

The superblock contains important information about the filesystem. Refer to ``Superblock'' for details.

The Version 2 superblock differs from the Version 1 superblock in that it contains pointers to the object location table and its replica.

Object location table

The Object Location Table (OLT) can be considered an extension of the superblock. The OLT contains information used at mount time to locate filesystem structures that are not in fixed locations. The OLT is located immediately after the superblock (starting at block 2) and is 8KB long.

The OLT is replicated and its replica is located immediately after the intent log. The OLT and its replica are separated to minimize the potential for losing both copies of the vital OLT information in the event of localized disk damage.

The contents and use of the OLT are described in detail in ``Locating dynamic structures''.

Intent log

The intent log is a circular activity log used by VxFS to guarantee filesystem integrity. Refer to ``Intent log'' for details.

Allocation unit

An allocation unit is a group of consecutive blocks in a filesystem that contain a resource summary, free resource map, and data blocks. Allocation units also contain copies of the superblock that can be used for recovery purposes.

The Version 2 allocation unit is similar to that of Version 1, but is located after the OLT replica. All of the Version 2 allocation unit components deal with the allocation of disk space. Those components of the Version 1 allocation unit that deal with inode allocation have been relocated elsewhere for Version 2. In particular, the inode list now resides in an inode list file and the inode allocation information now resides in an inode allocation unit (see ``Inode allocation unit'').

The contents of the allocation unit are


allocation unit header
Refer to ``Allocation unit'' for details.

allocation unit summary
The allocation unit summary summarizes the resources (data blocks) used in the allocation unit. This includes information on the number of free extents of each size in the allocation unit and a flag indicating the status of the summary.

free extent map
Refer to ``Free extent map'' for details.

padding
Refer to ``Padding'' for details.

VxFS Data Blocks
The balance of the allocation unit is occupied by data blocks.

One or more allocation units exist per filesystem. The number and size of allocation units can be specified when the filesystem is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.

Filesets and structural files

With the Version 2 layout many structural elements of the filesystem are encapsulated in files to allow dynamic allocation of the file system structure. Files that store this filesystem structural data are referred to as ``structural files''. As the filesystem grows, more space is allocated to the structural files. Structural files are intended for filesystem use only and are generally invisible to users.

The Version 2 layout supports ``filesets'', which are are collections of files that exist within a filesystem. Each filesystem contains at least two fileset types:


attribute fileset
A special fileset that stores the structural elements of the filesystem in the form of structural files. These files are the ``property'' of the filesystem and are not normally visible to the user. ACLs are also stored in the attribute fileset.

unnamed fileset
A fileset that contains the files that are visible and accessible by users.
Structural files exist in the attribute fileset only and include:

fileset header file
A file containing a series of fileset headers. (See ``Fileset header'' for details.)

inode list file
A file containing a series of inodes. (See ``Inodes'' for details.)

inode allocation unit file
A file containing a series of inode allocation units. (See ``Inode allocation unit'' for details.)

current usage table file
A file containing a series of fileset usage entries. (See ``Current usage table''

link count table file
A file containing a link count for each inode in the attribute fileset. (See ``Link count table'' for details.)

Although structural files are located in the attribute fileset, they can ``belong'' to another fileset. For example, the inode list file for the unnamed fileset is in the attribute fileset, but the structural details that it contains are only applicable to the unnamed fileset.

Each fileset is defined by structural files as follows:

Fileset metadata that cannot be reconstructed from the inode list is replicated to help fsck(ADM) reconstruct the filesystem in the event of disk damage.

Fileset header

A fileset header exists for each fileset and contains information about the contents and characteristics of that fileset. All fileset headers are stored in a single fileset header file in the attribute fileset. The fileset header file contains one fileset header per fileset. Each fileset header entry is 1 block long. The fileset header file is replicated because fileset headers cannot be rebuilt from other data structures.

The fileset header for a given fileset includes information such as:

Inodes

An inode is a data structure that contains information about a file. The Version 2 inode structure is similar to that of Version 1, with the addition of fields supporting ACLs. Refer to ``Inodes'' for details on the inode contents.

Version 2 inodes differ from Version 1 inodes in that they are located in structural files to facilitate dynamic allocation. Instead of allocating a fixed number of inodes into the file system at mkfs(ADM) time, a minimum number of inodes is allocated by mkfs and additional inodes are later allocated as they are needed during filesystem use.

The inode list is a series of inodes located in the inode list file. There is one inode in the list for every file in a given fileset. The inode list file is replicated in that it is referenced by two inodes that point to the same set of data blocks. Although the inode addresses are replicated for recovery purposes, the inodes themselves are not.

An inode extent is an extent that contains inodes and is 8K long by default. Inode extents are dynamically allocated to store inodes as they are needed.

The initial inode list extents contain the inodes first allocated by mkfs for each fileset in a filesystem.

``Inode lists'' illustrates the initial inode list extents allocated for the unnamed and attribute filesets. Each of these extents contains 32 inodes and is 8K long.

Inode lists

The construction of the unnamed fileset's inode list resembles that of the VxFS Version 1 disk layout, with the first two inodes unallocated and inodes 2 and 3 preassigned to the root and lost+found directories. The attribute fileset's inode list is similarly constructed, with certain inodes allocated for specific files and other inodes unallocated.

There are two initial inode list extents for the attribute fileset. These contain the inodes for all structural files needed to find and set up the filesystem.

The attribute fileset's inode list contains a few entries that are replicas of one another. For example, inodes 4 and 36 both reference copies of the fileset header file. The replicated inodes are used by fsck to reconstruct the filesystem in the event of damage to either one of the replicas. Although the two initial inode list extents belonging to the attribute fileset are logically contiguous, they are physically separated. This helps to ensure the integrity of the replicated information and reduces the chance that localized disk damage might result in complete loss of the filesystem.

Inodes 6 and 38 in the attribute fileset reference the inode list file for the attribute fileset. The contents of this file are the two inode extents pictured for the attribute fileset. Likewise, the attribute fileset inodes 7 and 39 reference the inode list file for the unnamed fileset. This file contains the single extent pictured for the unnamed fileset. All of the unused inodes in the initial extents of the attribute inode list are reserved for future use.

Inode allocation unit

An inode allocation unit (IAU) contains inode allocation information for a given fileset. Each fileset contains one or more IAUs, each of which details allocation for a set number of inodes. The number of inodes per IAU varies, depending on the block size being used. One IAU exists for every 16,384 inodes in a fileset with the default block size (1024 bytes). If an IAU is damaged, the information that it contains can be reconstructed by examining the fileset's inode list.

The IAUs for a fileset are stored in sequential order in the fileset's IAU file. The fileset header identifies the attribute fileset inode associated with that fileset's IAU file.

All IAU components begin on a block boundary and have the following structure and content.


IAU header
The IAU header verifies that the IAU matches the fileset. The IAU header occupies the first block of each IAU If damaged, the IAU can be reconstructed from inode and other information.

IAU summary
The IAU summary summarizes the resources used in the IAU. It includes information on the number of free inodes in the IAU and the number of inodes with extended operation sets in the IAU. The IAU summary is 1 block long.

Free inode map
The free inode map is a bitmap that indicates which inodes are free and which are allocated. A free inode is indicated by the bit being on. The length of the free inode map is 2K for filesystems with 1K or 2K block sizes and is equal to the block size for filesystems with larger block sizes.

Extended inode operations map
The extended inode operations map keeps track of inodes on which operations would remain pending for too long to reside in the intent log. The structure of this map is the same as the extended inode operations map located in the Version 1 allocation unit. Refer to ``Extended inode operations map'' for details.

Link count table

The Link Count Table (LCT) contains a reference count for each inode in the associated fileset. This reference count is identical to the conventional link field of an inode. Each LCT entry contains the actual reference count for the associated fileset inode. The link count field in an inode itself is set to either 0 or 1, and the actual number of links is stored in the LCT entry for the associated fileset inode.

The current layout only uses the LCT for inodes in the attribute fileset. The LCT supports quick updates of the link count for attribute fileset inodes.

The link count table can be reconstructed from the inode list, so it is not replicated.

Current usage table

The Current Usage Table (CUT) is a file that contains usage-related information for each fileset. The information contained in the CUT changes frequently and is not replicated. The information in the CUT can, however, be reconstructed from the inode list if the CUT is damaged.

The CUT file contains one entry per fileset. The CUT entry for a given fileset contains information such as the number of blocks currently used by the fileset.

Locating dynamic structures

The existence of dynamic structures in the Version 2 disk layout makes the task of initially locating those structures difficult. The Object Location Table (OLT) contains information needed to locate important filesystem structural elements at mount time.

Using the object location table

The OLT records the starting block numbers of the initial inode list extents for the attribute fileset and indicates which inodes within those initial extents reference the fileset header file.

The OLT is composed of records for the following:


fileset header inodes
This record identifies the inode numbers of the fileset header file and its replica.

initial inode list extent addresses
This record identifies the addresses of the beginning of each of two 8K inode extents. These are the initial inode list extents, which contain the inodes for all structural files belonging to the attribute fileset.

current usage table inode
This record identifies the inode number of the file that contains the current usage table.

Mounting with the object location table

The superblock plays an important role in locating the OLT at mount time in that it contains pointers to both the OLT and its replica.

Using the OLT, the process of mounting a VxFS Version 2 file system is as follows:

  1. Read in the superblock.

  2. Validate the superblock and its replicas (located in the allocation unit headers).

  3. Read and validate the OLT and its replica at the locations recorded in the superblock.

  4. Obtain the addresses of the initial inode list extents for the attribute fileset from the OLT.

  5. Read in these initial inode extents.

  6. Find the fileset header file, based on the fileset header file inode number recorded in the OLT.

  7. Read the contents of the fileset header file. Each fileset header file entry represents a particular fileset and indicates the inode numbers of its inode list file and IAU file. The attribute fileset is set up first so that subsequent references to its inode list can be resolved.

The VxFS version 3 disk layout

The VxFS version 3 disk layout is a triple indirect disk layout that is not used on the system.

The VxFS version 4 disk layout

Many aspects of the previous VxFS disk layouts are preserved in the Version 4 disk layout. However, the Version 4 layout differs from the Version 2 layout in that it includes support for the following:

Large files

A ``large file'' has a size that is 2GB or larger, limited by the offset size provided by the operating system. For this release, the offset size is a 64-bit quantity, so a large file can have sizes up to 2[64] bytes.

The filesystem size is limited, however, to be less than 2[31] sectors. The largest non-sparse file is thus limited to be less than 2[40] bytes.

In order to represent this amount of data for one file, the VxFS Version 4 disk layout provides extents with variable sizes, and organizes them in a ``balanced tree'' data structure. The new organization is specified in an inode type field.

A variable extent defines its type, size, and length. One extent can be as small as 1 filesystem block, or as large as 128 billion filesystem blocks. Depending on whether there are empty regions in a file (``holes'') or if there are no large sequential runs of data blocks available, the allocation of extents for a large file can vary from one extent to a large number of extents.

As a file grows, an indirect block might be needed. Indirect blocks holding up to 512 variable sized extents are organized in a tree. The entries are entered according to their offset order in the file; if needed, further indirect blocks are allocated to provide multiple possible levels of indirect blocks. The tree is maintained with approximately equal depth in indirect blocks, providing efficient access to any part of the file.

Quotas

The VxFS filesystem allows you to place per-user quota limits on the use of two principal resources of a filesystem: inodes and data blocks. For each of these resources, you can assign a user a quota. A quota consists of two limits for each resource, known as the ``soft'' and ``hard'' limits. The user cannot exceed the hard limit on blocks or inodes under any circumstances. The soft limit is smaller than the hard limit, and can be exceeded for a limited amount of time. This allows users to temporarily exceed the soft limits if needed, as long as they reduce their resource use before the time limit expires.

This design allows you to configure a quota policy that provides warnings to users before they are denied additional resources.

You can set the following for each user:

Quota information associated with user IDs is stored in quota files. The quota file can be in any filesystem.

Quota administration for VxFS is performed using the edquota_vxfs(ADM), repquota_vxfs(ADM), quot_vxfs(ADM), quota_vxfs(ADM), quotaon_vxfs(ADM), and quotaoff_vxfs(ADM) commands. Refer to the manual pages for these commands for more information. The VxFS quota commands work only on VxFS filesystems.

Logical filesystem partitioning

The VxFS Version 4 disk layout logically divides the entire filesystem device into allocation units and then uses the VxFS allocator to store all the filesystem structural data into special structural files. In Version 4, the first allocation unit starts at block zero and all allocation units are a fixed length of 32 blocks.

The Version 4 disk layout places structural files into a special structural fileset, which is similar to the Version 2 attribute fileset. In Version 4, however, there are several new structural files introduced.

The structural files added with Version 4 include:


Label file
Encapsulates the superblock and superblock replicas.

Device file
Records device information such as volume length and volume label, contains pointers to other structural files such as the extent map file and the AU summary file.

OLT file
Contains the object location table.

OLT Replica file
Full data replica of the object location table.

Log file
The filesystem intent log.

AU Summary file
Contains the AU summary for each allocation unit.

Extent Map file
Contains the free extent map for each allocation unit.

Extent Map replica file
Consists of a replica inode that points to the same data blocks as the Extent Map file.

Other optional files
The filesystem may optionally have other structural files, depending on which features are in use (for example, there may be a structural quotas file).

Next topic: Using mkfs to create a VxFS filesystem
Previous topic: VxFS disk space allocation

© 2007 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 05 June 2007