When an inode's attribute fork space is used up with shortform attributes and more are added, the attribute format is migrated to "extents".
Extent based attributes use hash/index pairs to speed up an attribute lookup. The first part of the "leaf" contains an array of fixed size hash/index pairs with the flags stored as well. The remaining part of the leaf block contains the array name/value pairs, where each element varies in length.
Each leaf is based on the xfs_da_blkinfo_t
block header declared in Leaf Directories. The structure encapsulating all other structures in the xfs_attr_leafblock_t
.
The structures involved are:
typedef struct xfs_attr_leaf_map {
__be16 base;
__be16 size;
} xfs_attr_leaf_map_t;
typedef struct xfs_attr_leaf_hdr {
xfs_da_blkinfo_t info;
__be16 count;
__be16 usedbytes;
__be16 firstused;
__u8 holes;
__u8 pad1;
xfs_attr_leaf_map_t freemap[3];
} xfs_attr_leaf_hdr_t;
typedef struct xfs_attr_leaf_entry {
__be32 hashval;
__be16 nameidx;
__u8 flags;
__u8 pad2;
} xfs_attr_leaf_entry_t;
typedef struct xfs_attr_leaf_name_local {
__be16 valuelen;
__u8 namelen;
__u8 nameval[1];
} xfs_attr_leaf_name_local_t;
typedef struct xfs_attr_leaf_name_remote {
__be32 valueblk;
__be32 valuelen;
__u8 namelen;
__u8 name[1];
} xfs_attr_leaf_name_remote_t;
typedef struct xfs_attr_leafblock {
xfs_attr_leaf_hdr_t hdr;
xfs_attr_leaf_entry_t entries[1];
xfs_attr_leaf_name_local_t namelist;
xfs_attr_leaf_name_remote_t valuelist;
} xfs_attr_leafblock_t;
Each leaf header uses the following magic number:
#define XFS_ATTR_LEAF_MAGIC 0xfbee
The hash/index elements in the entries[]
array are packed from the top of the block. Name/values grow from the bottom but are not packed. The freemap contains run-length-encoded entries for the free bytes after the entries[]
array, but only the three largest runs are stored (smaller runs are dropped). When the freemap
doesn’t show enough space for an allocation, name/value area is compacted and allocation is tried again. If there still isn't enough space, then the block is split. The name/value structures (both local and remote versions) must be 32-bit aligned.
For attributes with small values (ie. the value can be stored within the leaf), the XFS_ATTR_LOCAL
flag is set for the attribute. The entry details are stored using the xfs_attr_leaf_name_local_t
structure. For large attribute values that cannot be stored within the leaf, separate filesystem blocks are allocated to store the value. They use the xfs_attr_leaf_name_remote_t
structure.
Both local and remote entries can be interleaved as they are only addressed by the hash/index entries. The flag is stored with the hash/index pairs so the appropriate structure can be used.
Since duplicate hash keys are possible, for each hash that matches during a lookup, the actual name string must be compared.
An “incomplete” bit is also used for attribute flags. It shows that an attribute is in the middle of being created and should not be shown to the user if we crash during the time that the bit is set. The bit is cleared when attribute has finished being setup. This is done because some large attributes cannot be created inside a single transaction.
xfs_db Example:
A single 30KB extended attribute is added to an inode:
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 9
core.nextents = 0
core.naextents = 1
core.forkoff = 15
core.aformat = 2 (extents)
...
a.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,37535,9,0]
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfbee
hdr.count = 1
hdr.usedbytes = 20
hdr.firstused = 4076
hdr.holes = 0
hdr.freemap[0-2] = [base,size] 0:[40,4036] 1:[0,0] 2:[0,0]
entries[0] = [hashval,nameidx,incomplete,root,secure,local]
0:[0xfcf89d4f,4076,0,0,0,0]
nvlist[0].valueblk = 0x1
nvlist[0].valuelen = 30692
nvlist[0].namelen = 8
nvlist[0].name = "big_attr"
Attribute blocks 1 to 8 (filesystem blocks 37536 to 37543) contain the raw binary value data for the attribute.
Index 4076 (0xfec) is the offset into the block where the name/value information is. As can be seen by the value, it's at the end of the block:
xfs_db> type text
xfs_db> p
000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 01 00 14 ................
010: 0f ec 00 00 00 28 0f c4 00 00 00 00 00 00 00 00 ................
020: fc f8 9d 4f 0f ec 00 00 00 00 00 00 00 00 00 00 ...O............
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
...
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................
ff0: 00 00 77 e4 08 62 69 67 5f 61 74 74 72 00 00 00 ..w..big.attr...
A 30KB attribute and a couple of small attributes are added to a file:
xfs_db> inode <inode#>
xfs_db> p
...
core.nblocks = 10
core.extsize = 0
core.nextents = 1
core.naextents = 2
core.forkoff = 15
core.aformat = 2 (extents)
...
u.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,81857,1,0]
a.bmx[0-1] = [startoff,startblock,blockcount,extentflag]
0:[0,81858,1,0]
1:[1,182398,8,0]
xfs_db> ablock 0
xfs_db> p
hdr.info.forw = 0
hdr.info.back = 0
hdr.info.magic = 0xfbee
hdr.count = 3
hdr.usedbytes = 52
hdr.firstused = 4044
hdr.holes = 0
hdr.freemap[0-2] = [base,size] 0:[56,3988] 1:[0,0] 2:[0,0]
entries[0-2] = [hashval,nameidx,incomplete,root,secure,local]
0:[0x1e9d3934,4044,0,0,0,1]
1:[0x1e9d3937,4060,0,0,0,1]
2:[0xfcf89d4f,4076,0,0,0,0]
nvlist[0].valuelen = 6
nvlist[0].namelen = 5
nvlist[0].name = "attr2"
nvlist[0].value = "value2"
nvlist[1].valuelen = 6
nvlist[1].namelen = 5
nvlist[1].name = "attr1"
nvlist[1].value = "value1"
nvlist[2].valueblk = 0x1
nvlist[2].valuelen = 30692
nvlist[2].namelen = 8
nvlist[2].name = "big_attr"
As can be seen in the entries array, the two small attributes have the local flag set and the values are printed.
A raw disk dump shows the attributes. The last attribute added is highlighted (offset 4044 or 0xfcc):