Product SiteDocumentation Site

Chapter 5. Data Extents

5.1. Extent List
5.2. B+tree Extent List
XFS allocates space for a file using extents: starting location and length. XFS extents also specify the file's logical starting offset for a file. This allows a files extent map to automatically support sparse files (i.e. "holes" in the file). A flag is also used to specify if the extent has been preallocated and not yet been written to (unwritten extent).
A file can have more than one extent if one chunk of contiguous disk space is not available for the file. As a file grows, the XFS space allocator will attempt to keep space contiguous and merge extents. If more than one file is being allocated space in the same AG at the same time, multiple extents for the files will occur as the extents get interleaved. The effect of this can vary depending on the extent allocator used in the XFS driver.
An extent is 128 bits in size and uses the following packed layout:
30
The extent is represented by the xfs_bmbt_rec_t structure which uses a big endian format on-disk. In-core management of extents use the xfs_bmbt_irec_t structure which is the unpacked version of xfs_bmbt_rec_t:
typedef struct xfs_bmbt_irec {
     xfs_fileoff_t             br_startoff;
     xfs_fsblock_t             br_startblock;
     xfs_filblks_t             br_blockcount;
     xfs_exntst_t              br_state;
} xfs_bmbt_irec_t;
The extent br_state field uses the following enum declaration:
typedef enum {
     XFS_EXT_NORM,
     XFS_EXT_UNWRITTEN,
     XFS_EXT_INVALID
} xfs_exntst_t;
Some other points about extents:
The following two subsections cover the two methods of storing extent information for a file. The first is the fastest and simplest where the inode completely contains an extent array to the file's data. The second is slower and more complex B+tree which can handle thousands to millions of extents efficiently.

5.1.  Extent List

Local extents are where the entire extent array is stored within the inode's data fork itself. This is the most optimal in terms of speed and resource consumption. The trade-off is the file can only have a few extents before the inode runs out of space.
The "data fork" of the inode contains an array of extents, the size of the array determined by the inode's di_nextents value.
32
The number of extents that can fit in the inode depends on the inode size and di_forkoff. For a default 256 byte inode with no extended attributes, a file can up to 19 extents with this format. Beyond this, extents have to use the B+tree format.

xfs_db Example:

An 8MB file with one extent:
xfs_db> inode <inode#>
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 1
core.format = 2 (extents)
...
core.size = 8294400
core.nblocks = 2025
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
...
u.bmx[0] = [startoff,startblock,blockcount,extentflag]
          0:[0,25356,2025,0]
A 24MB file with three extents:
xfs_db> inode <inode#>
xfs_db> p
...
core.format = 2 (extents)
...
core.size = 24883200
core.nblocks = 6075
core.nextents = 3
...
u.bmx[0-2] = [startoff,startblock,blockcount,extentflag]
          0:[0,27381,2025,0]
          1:[2025,31431,2025,0]
          2:[4050,35481,2025,0]
Raw disk version of the inode with the third extent highlighted (di_u always starts at offset 0x64):
code33a
We can expand the highlighted section into the following bit array from MSB to LSB with the file offset and the block count highlighted:
code33b
A 4MB file with two extents and a hole in the middle, the first extent containing 64KB of data, the second about 4MB in containing 32KB (write 64KB, lseek ~4MB, write 32KB operations):
xfs_db> inode <inode#>
xfs_db> p
...
core.format = 2 (extents)
...
core.size = 4063232
core.nblocks = 24
core.nextents = 2
...
u.bmx[0-1] = [startoff,startblock,blockcount,extentflag]
          0:[0,37506,16,0]
          1:[984,37522,8,0]