When the "leaf" information fills a block, the extents undergo another separation. All "freeindex" information moves into its own extent. Like Leaf Directories (
Section 6.3, “ Leaf Directories”), the "leaf" block maintained the best free space information for each "data" block. This is not possible with more than one leaf.
The "data" blocks stay the same as leaf directories.
The "leaf" blocks eventually change into a B+tree with the generic B+tree header pointing to directory "leaves" as described in Leaf Directories. The top-level blocks are called "nodes". It can exist in a state where there is still a single leaf block before it's split. Interpretation of the node vs. leaf blocks has to be performed by inspecting the magic value in the header. The combined leaf/freeindex blocks has a magic value of XFS_DIR2_LEAF1_MAGIC (0xd2f1)
, a node directory's leaf/leaves have a magic value of XFS_DIR2_LEAFN_MAGIC (0xd2ff)
and intermediate nodes have a magic value of XFS_DA_NODE_MAGIC (0xfebe)
.
The new "freeindex" block(s) only contains the bests for each data block.
The freeindex block uses the following structures:
typedef struct xfs_dir2_free_hdr {
__uint32_t magic;
__int32_t firstdb;
__int32_t nvalid;
__int32_t nused;
} xfs_dir2_free_hdr_t;
typedef struct xfs_dir2_free {
xfs_dir2_free_hdr_t hdr;
xfs_dir2_data_off_t bests[1];
} xfs_dir2_free_t;
The location of the leaf blocks can be in any order, the only way to determine the appropriate is by the node block hash/before values. Given a hash to lookup, you read the node's btree
array and first hashval
in the array that exceeds the given hash and it can then be found in the block pointed to by the before
value.
typedef struct xfs_da_intnode {
struct xfs_da_node_hdr {
xfs_da_blkinfo_t info;
__uint16_t count;
__uint16_t level;
} hdr;
struct xfs_da_node_entry {
xfs_dahash_t hashval;
xfs_dablk_t before;
} btree[1];
} xfs_da_intnode_t;
The freeindex's bests
array starts from the end of the block and grows to the start of the block.
When an data block becomes unused (ie. all entries in it have been deleted), the block is freed, the data extents contain a hole, and the freeindex's hdr.nused
value is decremented and the associated bests[]
entry is set to 0xffff.
As the first data block always contains "." and "..", it's invalid for the directory to have a hole at the start.
The freeindex's hdr.nvalid
should always be the same as the number of allocated data directory blocks containing name/inode data and will always be less than or equal to hdr.nused. hdr.nused
should be the same as the index of the last data directory block plus one (i.e. when the last data block is freed, nused
and nvalid
are decremented).
xfs_db Example:
With the node directory examples, we are using a filesystems with 4KB block size, and a 16KB directory size. The directory has over 2000 entries:
xfs_db> sb 0
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
...
dirblklog = 2
...
xfs_db> inode <inode#>
xfs_db> p
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 2 (extents)
...
core.size = 81920
core.nblocks = 36
core.extsize = 0
core.nextents = 8
...
u.bmx[0-7] = [startoff,startblock,blockcount,extentflag] 0:[0,7368,4,0]
1:[4,7408,4,0] 2:[8,7444,4,0] 3:[12,7480,4,0] 4:[16,7520,4,0]
5:[8388608,7396,4,0] 6:[8388612,7524,8,0] 7:[16777216,7516,4,0]
As can already be observed, all extents are allocated is multiples of 4 blocks.
Blocks 0 to 19 (16+4-1) are used for the data. Looking at blocks 16-19, it can seen that it's the same as the single-leaf format, except the length
values are a lot larger to accommodate the increased directory block size:
xfs_db> dblock 16
xfs_db> type dir2
xfs_db> p
dhdr.magic = 0x58443244
dhdr.bestfree[0].offset = 0xb0
dhdr.bestfree[0].length = 0x3f50
dhdr.bestfree[1].offset = 0
dhdr.bestfree[1].length = 0
dhdr.bestfree[2].offset = 0
dhdr.bestfree[2].length = 0
du[0].inumber = 120224
du[0].namelen = 15
du[0].name = "frame002043.tst"
du[0].tag = 0x10
du[1].inumber = 120225
du[1].namelen = 15
du[1].name = "frame002044.tst"
du[1].tag = 0x30
du[2].inumber = 120226
du[2].namelen = 15
du[2].name = "frame002045.tst"
du[2].tag = 0x50
du[3].inumber = 120227
du[3].namelen = 15
du[3].name = "frame002046.tst"
du[3].tag = 0x70
du[4].inumber = 120228
du[4].namelen = 15
du[4].name = "frame002047.tst"
du[4].tag = 0x90
du[5].freetag = 0xffff
du[5].length = 0x3f50
du[5].tag = 0
Next, the "node" block, the fields are preceded with 'n' for node blocks:
xfs_db> dblock 8388608
xfs_db> type dir2
xfs_db> p
nhdr.info.forw = 0
nhdr.info.back = 0
nhdr.info.magic = 0xfebe
nhdr.count = 2
nhdr.level = 1
nbtree[0-1] = [hashval,before] 0:[0xa3a440ac,8388616] 1:[0xf3a440bc,8388612]
The following leaf blocks have been allocated once as XFS knows it needs at two blocks when allocating a B+tree, so the length is 8 fsblocks. For all hashes < 0xa3a440ac, they are located in the directory offset 8388616 and hashes below 0xf3a440bc are in offset 8388612. Hashes above f3a440bc don't exist in this directory.
xfs_db> dblock 8388616
xfs_db> type dir2
xfs_db> p
lhdr.info.forw = 8388612
lhdr.info.back = 0
lhdr.info.magic = 0xd2ff
lhdr.count = 1023
lhdr.stale = 0
lents[0].hashval = 0x2e
lents[0].address = 0x2
lents[1].hashval = 0x172e
lents[1].address = 0x4
lents[2].hashval = 0x23a04084
lents[2].address = 0x116
...
lents[1021].hashval = 0xa3a440a4
lents[1021].address = 0x1fa2
lents[1022].hashval = 0xa3a440ac
lents[1022].address = 0x1fca
xfs_db> dblock 8388612
xfs_db> type dir2
xfs_db> p
lhdr.info.forw = 0
lhdr.info.back = 8388616
lhdr.info.magic = 0xd2ff
lhdr.count = 1027
lhdr.stale = 0
lents[0].hashval = 0xa3a440b4
lents[0].address = 0x1f52
lents[1].hashval = 0xa3a440bc
lents[1].address = 0x1f7a
...
lents[1025].hashval = 0xf3a440b4
lents[1025].address = 0x1f66
lents[1026].hashval = 0xf3a440bc
lents[1026].address = 0x1f8e
An example lookup using xfs_db:
xfs_db> hash frame001845.tst
0xf3a26094
Doing a binary search through the array, we get address 0x1ce6, which is
offset 0xe730. Each fsblock is 4KB in size (0x1000), so it will be offset
0x730 into directory offset 14. From the extent map, this will be fsblock
7482:
xfs_db> fsblock 7482
xfs_db> type text
xfs_db> p
...
730: 00 00 00 00 00 01 d4 da 0f 66 72 61 6d 65 30 30 .........frame00
740: 31 38 34 35 2e 74 73 74 00 00 00 00 00 00 27 30 1845.tst.......0
Looking at the freeindex information (fields with an 'f' tag):
xfs_db> fsblock 7516
xfs_db> type dir2
xfs_db> p
fhdr.magic = 0x58443246
fhdr.firstdb = 0
fhdr.nvalid = 5
fhdr.nused = 5
fbests[0-4] = 0:0x10 1:0x10 2:0x10 3:0x10 4:0x3f50
The raw disk layout, old data is not cleared after the array. The fbests array is highlighted:
TODO: Example with a hole in the middle