Beyond the simple extent array, to efficiently manage large extent maps, XFS uses B+trees. The root node of the B+tree is stored in the inode's data fork. All block pointers for extent B+trees are 64-bit absolute block numbers.
For a single level B+tree, the root node points to the B+tree's leaves. Each leaf occupies one filesystem block and contains a header and an array of extents sorted by the file's offset. Each leaf has left and right (or backward and forward) block pointers to adjacent leaves. For a standard 4KB filesystem block, a leaf can contain up to 254 extents before a B+tree rebalance is triggered.
For a multi-level B+tree, the root node points to other B+tree nodes which eventually point to the extent leaves. B+tree keys are based on the file's offset. The nodes at each level in the B+tree point to the adjacent nodes.
The base B+tree node is used for extents, directories and extended attributes. The structures used for inode's B+tree root are:
typedef struct xfs_bmdr_block {
__be16 bb_level;
__be16 bb_numrecs;
} xfs_bmdr_block_t;
typedef struct xfs_bmbt_key {
xfs_dfiloff_t br_startoff;
} xfs_bmbt_key_t, xfs_bmdr_key_t;
typedef xfs_dfsbno_t xfs_bmbt_ptr_t, xfs_bmdr_ptr_t;
On disk, the B+tree node starts with the xfs_bmbr_block_t
header followed by an array of xfs_bmbt_key_t
values and then an array of xfs_bmbt_ptr_t
values. The size of both arrays is specified by the header's bb_numrecs
value.
The root node in the inode can only contain up to 19 key/pointer pairs for a standard 256 byte inode before a new level of nodes is added between the root and the leaves. This will be less if di_forkoff
is not zero (i.e. attributes are in use on the inode).
The subsequent nodes and leaves of the B+tree use the xfs_bmbt_block_t
declaration:
typedef struct xfs_btree_lblock xfs_bmbt_block_t;
typedef struct xfs_btree_lblock {
__be32 bb_magic;
__be16 bb_level;
__be16 bb_numrecs;
__be64 bb_leftsib;
__be64 bb_rightsib;
} xfs_btree_lblock_t;
For intermediate nodes, the data following xfs_bmbt_block_t
is the same as the root node: array of xfs_bmbt_key_t
value followed by an array of xfs_bmbt_ptr_t
values that starts halfway through the block (offset 0x808 for a 4096 byte filesystem block).
For leaves, an array of xfs_bmbt_rec_t
extents follow the xfs_bmbt_block_t
header.
Nodes and leaves use the same value for bb_magic
:
#define XFS_BMAP_MAGIC 0x424d4150 /* 'BMAP' */
The bb_level
value determines if the node is an intermediate node or a leaf. Leaves have a bb_level
of zero, nodes are one or greater.
Intermediate nodes, like leaves, can contain up to 254 pointers to leaf blocks for a standard 4KB filesystem block size as both the keys and pointers are 64 bits in size.
The following diagram illustrates a single level extent B+tree:
The following diagram illustrates a two level extent B+tree:
xfs_db Example:
TODO: