|
|
Three disk layouts are available with VxFS:
The vxupgrade command is provided to upgrade existing VxFS filesystems to the Version 4 layout. (Refer to vxupgrade(ADM) for details.)
Although the Version 4 layout is more complex, it shares many of the Version 1 and Version 2 features and characteristics. Since the Version 1 layout is simpler and easier to understand, familiarize yourself with the Version 1 layout before attempting to understand the more complex Version 2 and Version 4 layouts.
The VxFS Version 1 disk layout is composed of
The superblock contains important information about the filesystem, such as:
Copies of the superblock are kept in allocation unit headers (see ``Allocation unit''); these copies can be used for recovery purposes if the superblock is corrupted or destroyed.
In the event of system failure, the VxFS filesystem uses intent logging to guarantee filesystem integrity.
The intent log is a circular activity log with a default size of 512 blocks. If the filesystem is less than 4MB, the log size is reduced to avoid wasting space. This log contains records of the intention of the system to update a filesystem structure. An update to the filesystem structure (a transaction) is divided into separate subfunctions for each data structure that needs to be updated. A composite log record of the transaction is created that contains the subfunctions that constitute the transaction.
For example, the creation of a file that would expand the directory in which the file is contained produces a transaction consisting of the following subfunctions:
VxFS maintains log records in the intent log for all pending changes to the filesystem structure and ensures that the log records are written to disk in advance of the changes to the filesystem. Once the intent log has been written, the transaction's other updates to the filesystem can be written in any order. In the event of a system failure, the pending changes to the filesystem are either nullified or completed by the fsck utility. The VxFS intent log generally only records changes to the filesystem structure. File data changes are not normally logged.
An ``allocation unit'' is a group of consecutive blocks in a filesystem that contains resource summaries, free resource maps, inodes, and data blocks. Each component of an allocation unit begins on a block boundary. The VxFS Version 1 allocation unit structure is as follows:
One or more allocation units exist per filesystem. Allocation units are located immediately after the intent log. The number and size of allocation units can be specified when the filesystem is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.
The allocation unit header contains a copy of the filesystem's superblock that is used to verify that the allocation unit matches the superblock of the filesystem. The superblock copies contained in allocation unit headers can also be used for recovery purposes if the superblock is corrupted or destroyed. The allocation unit header occupies the first block of each allocation unit.
The allocation unit summary contains the number of inodes with extended operations pending, number of free inodes, and number of free extents in the allocation unit.
The free inode map is a bitmap that indicates which inodes are free and which are allocated. A free inode is indicated by the bit being on. Inodes zero and one are reserved by the filesystem; inode two is the inode for the root directory; inode three is the inode for the lost+found directory.
The extended inode operations map keeps track of inodes on which operations would remain pending for too long to reside in the intent log. The intent log must complete faster than it wraps, so lengthy operations are posted directly in the inode. The extended inode operations map is in the same format as the free inode map. This map is updated to identify the inodes that have extended operations that need to be completed. This map allows the fsck utility to quickly identify which inodes had extended operations pending at the time of a system failure.
The free extent map is a series of independent 512-byte bitmaps that are each referred to as a free extent map section. The first region of 2048 bits represents a section of 2048 one-block extents. The second region of 1024 bits represent a section of 1024 two-block extents. This sectioning continues for all powers of 2 up to the single bit that represents one 2048-block extent.
The one-block bitmaps always represent the true allocation of blocks from the allocation unit. The remaining bitmaps remap these same blocks, in a ``binary-buddy'' scheme, in increasingly larger-sized groups. As smaller extents are needed, the larger groups of blocks mapped by the buddy maps are broken apart to create the smaller extents.
An inode is a data structure that contains information about a file. The VxFS default inode size is currently 256 bytes.
Each inode stores information, such as the following, about a particular file:
If all of the direct extents are used, two indirect address extents are available for use in each node.
The inode list is a series of inodes. There is one inode in the list for every file.
It might be desirable to align data blocks to a physical boundary. To facilitate this, the system administrator can specify that a gap be left between the end of the inode list and the first data block.
The balance of the allocation unit is occupied by data blocks. Data blocks contain the actual data stored in files and directories.
Many aspects of the Version 1 disk layout are preserved in the Version 2 disk layout. However, the Version 2 layout differs from the Version 1 layout in that it includes support for the following:
Because many disk layout characteristics are shared by both the Version 1 and Version 2 disk layouts, you should have a general understanding of the Version 1 layout. Structures that are common to both disk layouts are described in detail in ``The VxFS version 1 disk layout'' and are only mentioned briefly here.
The relatively complex nature of the Version 2 layout is covered in the following general areas:
This section describes the structural elements of the filesystem that exist in fixed locations on the disk.
The VxFS Version 2 disk layout is composed of:
The superblock contains important information about the filesystem. Refer to ``Superblock'' for details.
The Version 2 superblock differs from the Version 1 superblock in that it contains pointers to the object location table and its replica.
The Object Location Table (OLT) can be considered an extension of the superblock. The OLT contains information used at mount time to locate filesystem structures that are not in fixed locations. The OLT is located immediately after the superblock (starting at block 2) and is 8KB long.
The OLT is replicated and its replica is located immediately after the intent log. The OLT and its replica are separated to minimize the potential for losing both copies of the vital OLT information in the event of localized disk damage.
The contents and use of the OLT are described in detail in ``Locating dynamic structures''.
The intent log is a circular activity log used by VxFS to guarantee filesystem integrity. Refer to ``Intent log'' for details.
An allocation unit is a group of consecutive blocks in a filesystem that contain a resource summary, free resource map, and data blocks. Allocation units also contain copies of the superblock that can be used for recovery purposes.
The Version 2 allocation unit is similar to that of Version 1, but is located after the OLT replica. All of the Version 2 allocation unit components deal with the allocation of disk space. Those components of the Version 1 allocation unit that deal with inode allocation have been relocated elsewhere for Version 2. In particular, the inode list now resides in an inode list file and the inode allocation information now resides in an inode allocation unit (see ``Inode allocation unit'').
The contents of the allocation unit are
One or more allocation units exist per filesystem. The number and size of allocation units can be specified when the filesystem is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.
With the Version 2 layout many structural elements of the filesystem are encapsulated in files to allow dynamic allocation of the file system structure. Files that store this filesystem structural data are referred to as ``structural files''. As the filesystem grows, more space is allocated to the structural files. Structural files are intended for filesystem use only and are generally invisible to users.
The Version 2 layout supports ``filesets'', which are are collections of files that exist within a filesystem. Each filesystem contains at least two fileset types:
Although structural files are located in the attribute fileset, they can ``belong'' to another fileset. For example, the inode list file for the unnamed fileset is in the attribute fileset, but the structural details that it contains are only applicable to the unnamed fileset.
Each fileset is defined by structural files as follows:
A fileset header exists for each fileset and contains information about the contents and characteristics of that fileset. All fileset headers are stored in a single fileset header file in the attribute fileset. The fileset header file contains one fileset header per fileset. Each fileset header entry is 1 block long. The fileset header file is replicated because fileset headers cannot be rebuilt from other data structures.
The fileset header for a given fileset includes information such as:
An inode is a data structure that contains information about a file. The Version 2 inode structure is similar to that of Version 1, with the addition of fields supporting ACLs. Refer to ``Inodes'' for details on the inode contents.
Version 2 inodes differ from Version 1 inodes in that they are located in structural files to facilitate dynamic allocation. Instead of allocating a fixed number of inodes into the file system at mkfs(ADM) time, a minimum number of inodes is allocated by mkfs and additional inodes are later allocated as they are needed during filesystem use.
The inode list is a series of inodes located in the inode list file. There is one inode in the list for every file in a given fileset. The inode list file is replicated in that it is referenced by two inodes that point to the same set of data blocks. Although the inode addresses are replicated for recovery purposes, the inodes themselves are not.
An inode extent is an extent that contains inodes and is 8K long by default. Inode extents are dynamically allocated to store inodes as they are needed.
The initial inode list extents contain the inodes first allocated by mkfs for each fileset in a filesystem.
``Inode lists''
illustrates the initial inode list extents allocated for the
unnamed and attribute filesets.
Each of these extents contains 32 inodes and is 8K long.
Inode lists
The construction of the unnamed fileset's inode list resembles that of the VxFS Version 1 disk layout, with the first two inodes unallocated and inodes 2 and 3 preassigned to the root and lost+found directories. The attribute fileset's inode list is similarly constructed, with certain inodes allocated for specific files and other inodes unallocated.
There are two initial inode list extents for the attribute fileset. These contain the inodes for all structural files needed to find and set up the filesystem.
The attribute fileset's inode list contains a few entries that are replicas of one another. For example, inodes 4 and 36 both reference copies of the fileset header file. The replicated inodes are used by fsck to reconstruct the filesystem in the event of damage to either one of the replicas. Although the two initial inode list extents belonging to the attribute fileset are logically contiguous, they are physically separated. This helps to ensure the integrity of the replicated information and reduces the chance that localized disk damage might result in complete loss of the filesystem.
Inodes 6 and 38 in the attribute fileset reference the inode list file for the attribute fileset. The contents of this file are the two inode extents pictured for the attribute fileset. Likewise, the attribute fileset inodes 7 and 39 reference the inode list file for the unnamed fileset. This file contains the single extent pictured for the unnamed fileset. All of the unused inodes in the initial extents of the attribute inode list are reserved for future use.
An inode allocation unit (IAU) contains inode allocation information for a given fileset. Each fileset contains one or more IAUs, each of which details allocation for a set number of inodes. The number of inodes per IAU varies, depending on the block size being used. One IAU exists for every 16,384 inodes in a fileset with the default block size (1024 bytes). If an IAU is damaged, the information that it contains can be reconstructed by examining the fileset's inode list.
The IAUs for a fileset are stored in sequential order in the fileset's IAU file. The fileset header identifies the attribute fileset inode associated with that fileset's IAU file.
All IAU components begin on a block boundary and have the following structure and content.
The Link Count Table (LCT) contains a reference count for each inode in the associated fileset. This reference count is identical to the conventional link field of an inode. Each LCT entry contains the actual reference count for the associated fileset inode. The link count field in an inode itself is set to either 0 or 1, and the actual number of links is stored in the LCT entry for the associated fileset inode.
The current layout only uses the LCT for inodes in the attribute fileset. The LCT supports quick updates of the link count for attribute fileset inodes.
The link count table can be reconstructed from the inode list, so it is not replicated.
The Current Usage Table (CUT) is a file that contains usage-related information for each fileset. The information contained in the CUT changes frequently and is not replicated. The information in the CUT can, however, be reconstructed from the inode list if the CUT is damaged.
The CUT file contains one entry per fileset. The CUT entry for a given fileset contains information such as the number of blocks currently used by the fileset.
The existence of dynamic structures in the Version 2 disk layout makes the task of initially locating those structures difficult. The Object Location Table (OLT) contains information needed to locate important filesystem structural elements at mount time.
The OLT records the starting block numbers of the initial inode list extents for the attribute fileset and indicates which inodes within those initial extents reference the fileset header file.
The OLT is composed of records for the following:
The superblock plays an important role in locating the OLT at mount time in that it contains pointers to both the OLT and its replica.
Using the OLT, the process of mounting a VxFS Version 2 file system is as follows:
The VxFS version 3 disk layout is a triple indirect disk layout that is not used on the system.
Many aspects of the previous VxFS disk layouts are preserved in the Version 4 disk layout. However, the Version 4 layout differs from the Version 2 layout in that it includes support for the following:
A ``large file'' has a size that is 2GB or larger, limited by the offset size provided by the operating system. For this release, the offset size is a 64-bit quantity, so a large file can have sizes up to 2[64] bytes.
The filesystem size is limited, however, to be less than 2[31] sectors. The largest non-sparse file is thus limited to be less than 2[40] bytes.
In order to represent this amount of data for one file, the VxFS Version 4 disk layout provides extents with variable sizes, and organizes them in a ``balanced tree'' data structure. The new organization is specified in an inode type field.
A variable extent defines its type, size, and length. One extent can be as small as 1 filesystem block, or as large as 128 billion filesystem blocks. Depending on whether there are empty regions in a file (``holes'') or if there are no large sequential runs of data blocks available, the allocation of extents for a large file can vary from one extent to a large number of extents.
As a file grows, an indirect block might be needed. Indirect blocks holding up to 512 variable sized extents are organized in a tree. The entries are entered according to their offset order in the file; if needed, further indirect blocks are allocated to provide multiple possible levels of indirect blocks. The tree is maintained with approximately equal depth in indirect blocks, providing efficient access to any part of the file.
The VxFS filesystem allows you to place per-user quota limits on the use of two principal resources of a filesystem: inodes and data blocks. For each of these resources, you can assign a user a quota. A quota consists of two limits for each resource, known as the ``soft'' and ``hard'' limits. The user cannot exceed the hard limit on blocks or inodes under any circumstances. The soft limit is smaller than the hard limit, and can be exceeded for a limited amount of time. This allows users to temporarily exceed the soft limits if needed, as long as they reduce their resource use before the time limit expires.
This design allows you to configure a quota policy that provides warnings to users before they are denied additional resources.
You can set the following for each user:
Quota information associated with user IDs is stored in quota files. The quota file can be in any filesystem.
Quota administration for VxFS is performed using the edquota_vxfs(ADM), repquota_vxfs(ADM), quot_vxfs(ADM), quota_vxfs(ADM), quotaon_vxfs(ADM), and quotaoff_vxfs(ADM) commands. Refer to the manual pages for these commands for more information. The VxFS quota commands work only on VxFS filesystems.
The VxFS Version 4 disk layout logically divides the entire filesystem device into allocation units and then uses the VxFS allocator to store all the filesystem structural data into special structural files. In Version 4, the first allocation unit starts at block zero and all allocation units are a fixed length of 32 blocks.
The Version 4 disk layout places structural files into a special structural fileset, which is similar to the Version 2 attribute fileset. In Version 4, however, there are several new structural files introduced.
The structural files added with Version 4 include: