When the shortform directory space exceeds the space in an inode, the directory data is moved into a new single directory block outside the inode. The inode's format is changed from "local" to "extent". Following is a list of points about block directories.
All directory data is stored within the one directory block, including "." and ".." entries which are mandatory.
The block also contains "leaf" and "freespace index " information.
The location of the block is defined by the inode's in-core extent list (
Section 5.1, “ Extent List”): the
di_u.u_bmx[0]
value. The file offset in the extent must always be zero and the
length
= (directory block size / filesystem block size). The block number points to the filesystem block containing the directory data.
Block directory data is stored in the following structures:
#define XFS_DIR2_DATA_FD_COUNT 3
typedef struct xfs_dir2_block {
xfs_dir2_data_hdr_t hdr;
xfs_dir2_data_union_t u[1];
xfs_dir2_leaf_entry_t leaf[1];
xfs_dir2_block_tail_t tail;
} xfs_dir2_block_t;
typedef struct xfs_dir2_data_hdr {
__uint32_t magic;
xfs_dir2_data_free_t bestfree[XFS_DIR2_DATA_FD_COUNT];
} xfs_dir2_data_hdr_t;
typedef struct xfs_dir2_data_free {
xfs_dir2_data_off_t offset;
xfs_dir2_data_off_t length;
} xfs_dir2_data_free_t;
typedef union {
xfs_dir2_data_entry_t entry;
xfs_dir2_data_unused_t unused;
} xfs_dir2_data_union_t;
typedef struct xfs_dir2_data_entry {
xfs_ino_t inumber;
__uint8_t namelen;
__uint8_t name[1];
xfs_dir2_data_off_t tag;
} xfs_dir2_data_entry_t;
typedef struct xfs_dir2_data_unused {
__uint16_t freetag; /* 0xffff */
xfs_dir2_data_off_t length;
xfs_dir2_data_off_t tag;
} xfs_dir2_data_unused_t;
typedef struct xfs_dir2_leaf_entry {
xfs_dahash_t hashval;
xfs_dir2_dataptr_t address;
} xfs_dir2_leaf_entry_t;
typedef struct xfs_dir2_block_tail {
__uint32_t count;
__uint32_t stale;
} xfs_dir2_block_tail_t;
The tag
in the xfs_dir2_data_entry_t
structure stores its offset from the start of the block.
Start of a free space region is marked with the xfs_dir2_data_unused_t
structure where the freetag
is 0xffff
. The freetag
and length
overwrites the inumber
for an entry. The tag
is located at length - sizeof(tag)
from the start of the unused
entry on-disk.
The bestfree
array in the header points to as many as three of the largest spaces of free space within the block for storing new entries sorted by largest to third largest. If there are less than 3 empty regions, the remaining bestfree
elements are zeroed. The offset
specifies the offset from the start of the block in bytes, and the length
specifies the size of the free space in bytes. The location each points to must contain the above xfs_dir2_data_unused_t
structure. As a block cannot exceed 64KB in size, each is a 16-bit value. bestfree
is used to optimise the time required to locate space to create an entry. It saves scanning through the block to find a location suitable for every entry created.
The tail
structure specifies the number of elements in the leaf
array and the number of stale
entries in the array. The tail
is always located at the end of the block. The leaf
data immediately precedes the tail
structure.
The leaf
array, which grows from the end of the block just before the tail
structure, contains an array of hash/address pairs for quickly looking up a name by a hash value. Hash values are covered by the introduction to directories. The address
on-disk is the offset into the block divided by 8 (XFS_DIR2_DATA_ALIGN
). Hash/address pairs are stored on disk to optimise lookup speed for large directories. If they were not stored, the hashes have to be calculated for all entries each time a lookup occurs in a directory.
xfs_db Example:
A directory is created with 8 entries, directory block size = filesystem block size:
xfs_db> sb 0
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
...
dirblklog = 0
...
xfs_db> inode <inode#>
xfs_db> p
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 2
...
core.size = 4096
core.nblocks = 1
core.extsize = 0
core.nextents = 1
...
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,2097164,1,0]
Go to the "startblock" and show the raw disk data:
xfs_db> dblock 0
xfs_db> type text
xfs_db> p
000: 58 44 32 42 01 30 0e 78 00 00 00 00 00 00 00 00 XD2B.0.x........
010: 00 00 00 00 02 00 00 80 01 2e 00 00 00 00 00 10 ................
020: 00 00 00 00 00 00 00 80 02 2e 2e 00 00 00 00 20 ................
030: 00 00 00 00 02 00 00 81 0f 66 72 61 6d 65 30 30 .........frame00
040: 30 30 30 30 2e 74 73 74 80 8e 59 00 00 00 00 30 0000.tst..Y....0
050: 00 00 00 00 02 00 00 82 0f 66 72 61 6d 65 30 30 .........frame00
060: 30 30 30 31 2e 74 73 74 d0 ca 5c 00 00 00 00 50 0001.tst.......P
070: 00 00 00 00 02 00 00 83 0f 66 72 61 6d 65 30 30 .........frame00
080: 30 30 30 32 2e 74 73 74 00 00 00 00 00 00 00 70 0002.tst.......p
090: 00 00 00 00 02 00 00 84 0f 66 72 61 6d 65 30 30 .........frame00
0a0: 30 30 30 33 2e 74 73 74 00 00 00 00 00 00 00 90 0003.tst........
0b0: 00 00 00 00 02 00 00 85 0f 66 72 61 6d 65 30 30 .........frame00
0c0: 30 30 30 34 2e 74 73 74 00 00 00 00 00 00 00 b0 0004.tst........
0d0: 00 00 00 00 02 00 00 86 0f 66 72 61 6d 65 30 30 .........frame00
0e0: 30 30 30 35 2e 74 73 74 00 00 00 00 00 00 00 d0 0005.tst........
0f0: 00 00 00 00 02 00 00 87 0f 66 72 61 6d 65 30 30 .........frame00
100: 30 30 30 36 2e 74 73 74 00 00 00 00 00 00 00 f0 0006.tst........
110: 00 00 00 00 02 00 00 88 0f 66 72 61 6d 65 30 30 .........frame00
120: 30 30 30 37 2e 74 73 74 00 00 00 00 00 00 01 10 0007.tst........
130: ff ff 0e 78 00 00 00 00 00 00 00 00 00 00 00 00 ...x............
The "leaf" and "tail" structures are stored at the end of the block, so as the directory grows, the middle is filled in:
fa0: 00 00 00 00 00 00 01 30 00 00 00 2e 00 00 00 02 .......0........
fb0: 00 00 17 2e 00 00 00 04 83 a0 40 b4 00 00 00 0e ................
fc0: 93 a0 40 b4 00 00 00 12 a3 a0 40 b4 00 00 00 06 ................
fd0: b3 a0 40 b4 00 00 00 0a c3 a0 40 b4 00 00 00 1e ................
fe0: d3 a0 40 b4 00 00 00 22 e3 a0 40 b4 00 00 00 16 ................
ff0: f3 a0 40 b4 00 00 00 1a 00 00 00 0a 00 00 00 00 ................
In a readable format:
xfs_db> type dir2
xfs_db> p
bhdr.magic = 0x58443242
bhdr.bestfree[0].offset = 0x130
bhdr.bestfree[0].length = 0xe78
bhdr.bestfree[1].offset = 0
bhdr.bestfree[1].length = 0
bhdr.bestfree[2].offset = 0
bhdr.bestfree[2].length = 0
bu[0].inumber = 33554560
bu[0].namelen = 1
bu[0].name = "."
bu[0].tag = 0x10
bu[1].inumber = 128
bu[1].namelen = 2
bu[1].name = ".."
bu[1].tag = 0x20
bu[2].inumber = 33554561
bu[2].namelen = 15
bu[2].name = "frame000000.tst"
bu[2].tag = 0x30
bu[3].inumber = 33554562
bu[3].namelen = 15
bu[3].name = "frame000001.tst"
bu[3].tag = 0x50
...
bu[8].inumber = 33554567
bu[8].namelen = 15
bu[8].name = "frame000006.tst"
bu[8].tag = 0xf0
bu[9].inumber = 33554568
bu[9].namelen = 15
bu[9].name = "frame000007.tst"
bu[9].tag = 0x110
bu[10].freetag = 0xffff
bu[10].length = 0xe78
bu[10].tag = 0x130
bleaf[0].hashval = 0x2e
bleaf[0].address = 0x2
bleaf[1].hashval = 0x172e
bleaf[1].address = 0x4
bleaf[2].hashval = 0x83a040b4
bleaf[2].address = 0xe
...
bleaf[8].hashval = 0xe3a040b4
bleaf[8].address = 0x16
bleaf[9].hashval = 0xf3a040b4
bleaf[9].address = 0x1a
btail.count = 10
btail.stale = 0
Note
Note that with block directories, all xfs_db fields are preceded with "b".
For a simple lookup example, the hash of frame000000.tst is 0xb3a040b4. Looking up that value, we get an address of 0x6. Multiply that by 8, it becomes offset 0x30 and the inode at that point is 33554561.
When we remove an entry from the middle (frame000004.tst), we can see how the freespace details are adjusted:
bhdr.magic = 0x58443242
bhdr.bestfree[0].offset = 0x130
bhdr.bestfree[0].length = 0xe78
bhdr.bestfree[1].offset = 0xb0
bhdr.bestfree[1].length = 0x20
bhdr.bestfree[2].offset = 0
bhdr.bestfree[2].length = 0
...
bu[5].inumber = 33554564
bu[5].namelen = 15
bu[5].name = "frame000003.tst"
bu[5].tag = 0x90
bu[6].freetag = 0xffff
bu[6].length = 0x20
bu[6].tag = 0xb0
bu[7].inumber = 33554566
bu[7].namelen = 15
bu[7].name = "frame000005.tst"
bu[7].tag = 0xd0
...
bleaf[7].hashval = 0xd3a040b4
bleaf[7].address = 0x22
bleaf[8].hashval = 0xe3a040b4
bleaf[8].address = 0
bleaf[9].hashval = 0xf3a040b4
bleaf[9].address = 0x1a
btail.count = 10
btail.stale = 1
A new "bestfree" value is added for the entry, the start of the entry is marked as unused with 0xffff (which overwrites the inode number for an actual entry), and the length of the space. The tag remains intact at the offset+length - sizeof(tag)
. The address for the hash is also cleared. The affected areas are highlighted below: