<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/xfs/libxfs, branch vm-bind</title>
<subtitle>Linux Kernel</subtitle>
<id>https://git.etezian.org/cgit.cgi/linux.git/atom?h=vm-bind</id>
<link rel='self' href='https://git.etezian.org/cgit.cgi/linux.git/atom?h=vm-bind'/>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/'/>
<updated>2022-06-29T15:47:56+00:00</updated>
<entry>
<title>xfs: don't hold xattr leaf buffers across transaction rolls</title>
<updated>2022-06-29T15:47:56+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-06-25T17:01:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=e53bcffad0326c1ef4b4baec4262b5343e420c44'/>
<id>urn:sha1:e53bcffad0326c1ef4b4baec4262b5343e420c44</id>
<content type='text'>
Now that we've established (again!) that empty xattr leaf buffers are
ok, we no longer need to bhold them to transactions when we're creating
new leaf blocks.  Get rid of the entire mechanism, which should simplify
the xattr code quite a bit.

The original justification for using bhold here was to prevent the AIL
from trying to write the empty leaf block into the fs during the brief
time that we release the buffer lock.  The reason for /that/ was to
prevent recovery from tripping over the empty ondisk block.

Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>xfs: empty xattr leaf header blocks are not corruption</title>
<updated>2022-06-29T15:47:56+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-06-24T22:01:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=7be3bd8856fba99f8b25b9c223250e42292c312e'/>
<id>urn:sha1:7be3bd8856fba99f8b25b9c223250e42292c312e</id>
<content type='text'>
TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify") because it was wrong.

Every now and then we get a corruption report from the kernel or
xfs_repair about empty leaf blocks in the extended attribute structure.
We've long thought that these shouldn't be possible, but prior to 5.18
one would shake loose in the recoveryloop fstests about once a month.

A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
happen every 7 minutes on my testing cloud.  I added a ton of logging to
detect any time we set the header count on an xattr leaf block to zero.
This produced the following dmesg output on generic/388:

XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl+0x34/0x44
 xfs_attr3_leaf_create+0x187/0x230
 xfs_attr_shortform_to_leaf+0xd1/0x2f0
 xfs_attr_set_iter+0x73e/0xa90
 xfs_xattri_finish_update+0x45/0x80
 xfs_attr_finish_item+0x1b/0xd0
 xfs_defer_finish_noroll+0x19c/0x770
 __xfs_trans_commit+0x153/0x3e0
 xfs_attr_set+0x36b/0x740
 xfs_xattr_set+0x89/0xd0
 __vfs_setxattr+0x67/0x80
 __vfs_setxattr_noperm+0x6e/0x120
 vfs_setxattr+0x97/0x180
 setxattr+0x88/0xa0
 path_setxattr+0xc3/0xe0
 __x64_sys_setxattr+0x27/0x30
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

So now we know that someone is creating empty xattr leaf blocks as part
of converting a sf xattr structure into a leaf xattr structure.  The
conversion routine logs any existing sf attributes in the same
transaction that creates the leaf block, so we know this is a setxattr
to a file that has no attributes at all.

Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
recovery.  I also augmented buffer item recovery to call -&gt;verify_struct
on any attr leaf blocks and complain if it finds a failure:

XFS (sda4): Unmounting Filesystem
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Starting recovery (logdev: internal)
XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
Call Trace:
 &lt;TASK&gt;
 dump_stack_lvl+0x34/0x44
 xfs_attr3_leaf_verify+0x3b8/0x420
 xlog_recover_buf_commit_pass2+0x60a/0x6c0
 xlog_recover_items_pass2+0x4e/0xc0
 xlog_recover_commit_trans+0x33c/0x350
 xlog_recovery_process_trans+0xa5/0xe0
 xlog_recover_process_data+0x8d/0x140
 xlog_do_recovery_pass+0x19b/0x720
 xlog_do_log_recovery+0x62/0xc0
 xlog_do_recover+0x33/0x1d0
 xlog_recover+0xda/0x190
 xfs_log_mount+0x14c/0x360
 xfs_mountfs+0x517/0xa60
 xfs_fs_fill_super+0x6bc/0x950
 get_tree_bdev+0x175/0x280
 vfs_get_tree+0x1a/0x80
 path_mount+0x6f5/0xaa0
 __x64_sys_mount+0x103/0x140
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc61e241eae

And a moment later, the _delwri_submit of the recovered buffers trips
the same verifier and recovery fails:

XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00  .....).x........
00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d  ......I......1&gt;.
00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00  ................
00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00  .P..............
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518).  Shutting down filesystem.
XFS (sda4): Please unmount the filesystem and rectify the problem(s)
XFS (sda4): log mount/recovery failed: error -117
XFS (sda4): log mount failed

I think I see what's going on here -- setxattr is racing with something
that shuts down the filesystem:

Thread 1				Thread 2
--------				--------
xfs_attr_sf_addname
xfs_attr_shortform_to_leaf
&lt;create empty leaf&gt;
xfs_trans_bhold(leaf)
xattri_dela_state = XFS_DAS_LEAF_ADD
&lt;roll transaction&gt;
					&lt;flush log&gt;
					&lt;shut down filesystem&gt;
xfs_trans_bhold_release(leaf)
&lt;discover fs is dead, bail&gt;

Thread 3
--------
&lt;cycle mount, start recovery&gt;
xlog_recover_buf_commit_pass2
xlog_recover_do_reg_buffer
&lt;replay empty leaf buffer from recovered buf item&gt;
xfs_buf_delwri_queue(leaf)
xfs_buf_delwri_submit
_xfs_buf_ioapply(leaf)
xfs_attr3_leaf_write_verify
&lt;trip over empty leaf buffer&gt;
&lt;fail recovery&gt;

As you can see, the bhold keeps the leaf buffer locked and thus prevents
the *AIL* from tripping over the ichdr.count==0 check in the write
verifier.  Unfortunately, it doesn't prevent the log from getting
flushed to disk, which sets up log recovery to fail.

So.  It's clear that the kernel has always had the ability to persist
attr leaf blocks with ichdr.count==0, which means that it's part of the
ondisk format now.

Unfortunately, this check has been added and removed multiple times
throughout history.  It first appeared in[1] kernel 3.10 as part of the
early V5 format patches.  The check was later discovered to break log
recovery and hence disabled[2] during log recovery in kernel 4.10.
Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
weed out the empty leaf blocks.  This was still not correct because log
recovery would recover an empty attr leaf block successfully only for
regular xattr operations to trip over the empty block during of the
block during regular operation.  Therefore, the check was removed
entirely[4] in kernel 5.7 but removal of the xfs_repair check was
forgotten.  The continued complaints from xfs_repair lead to us
mistakenly re-adding[5] the verifier check for kernel 5.19.  Remove it
once again.

[1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
[2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier
                   during log replay")
[3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
[4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf
                   block")
[5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
                   xfs_attr3_leaf_verify")

Looking at the rest of the xattr code, it seems that files with empty
leaf blocks behave as expected -- listxattr reports no attributes;
getxattr on any xattr returns nothing as expected; removexattr does
nothing; and setxattr can add attributes just fine.

Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block")
Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
</content>
</entry>
<entry>
<title>xfs: fix variable state usage</title>
<updated>2022-06-16T06:13:32+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-06-06T01:51:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=10930b254d5be1cb4350fb7a456ccd5ea7e3cbd9'/>
<id>urn:sha1:10930b254d5be1cb4350fb7a456ccd5ea7e3cbd9</id>
<content type='text'>
The variable @args is fed to a tracepoint, and that's the only place
it's used.  This is fine for the kernel, but for userspace, tracepoints
are #define'd out of existence, which results in this warning on gcc
11.2:

xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
 1440 |         struct xfs_da_args              *args = attr-&gt;xattri_da_args;
      |                                          ^~~~

Clean this up.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
</content>
</entry>
<entry>
<title>xfs: fix TOCTOU race involving the new logged xattrs control knob</title>
<updated>2022-06-16T06:13:32+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-06-06T01:51:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=f4288f01820e2d57722d21874c1fda661003c9b9'/>
<id>urn:sha1:f4288f01820e2d57722d21874c1fda661003c9b9</id>
<content type='text'>
I found a race involving the larp control knob, aka the debugging knob
that lets developers enable logging of extended attribute updates:

Thread 1			Thread 2

echo 0 &gt; /sys/fs/xfs/debug/larp
				setxattr(REPLACE)
				xfs_has_larp (returns false)
				xfs_attr_set

echo 1 &gt; /sys/fs/xfs/debug/larp

				xfs_attr_defer_replace
				xfs_attr_init_replace_state
				xfs_has_larp (returns true)
				xfs_attr_init_remove_state

				&lt;oops, wrong DAS state!&gt;

This isn't a particularly severe problem right now because xattr logging
is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
what they're doing.

However, the eventual intent is that callers should be able to ask for
the assistance of the log in persisting xattr updates.  This capability
might not be required for /all/ callers, which means that dynamic
control must work correctly.  Once an xattr update has decided whether
or not to use logged xattrs, it needs to stay in that mode until the end
of the operation regardless of what subsequent parallel operations might
do.

Therefore, it is an error to continue sampling xfs_globals.larp once
xfs_attr_change has made a decision about larp, and it was not correct
for me to have told Allison that -&gt;create_intent functions can sample
the global log incompat feature bitfield to decide to elide a log item.

Instead, create a new op flag for the xfs_da_args structure, and convert
all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
the attr update state machine to look for the operations flag.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Allison Henderson &lt;allison.henderson@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'guilt/xfs-5.19-larp-cleanups' into xfs-5.19-for-next</title>
<updated>2022-05-30T00:58:59+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>david@fromorbit.com</email>
</author>
<published>2022-05-30T00:58:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=7146bda743e6f543af60584fad2cfbb6ce83d8ac'/>
<id>urn:sha1:7146bda743e6f543af60584fad2cfbb6ce83d8ac</id>
<content type='text'>
This series contains a two key cleanups for the new LARP code.  Most
of it is refactoring and tweaking the code that creates kernel log
messages about enabling and disabling features -- we should be
warning about LARP being turned on once per mount, instead of once
per insmod cycle; we shouldn't be spamming the logs so aggressively
about turning *off* log incompat features.

The second part of the series refactors the LARP code responsible
for getting (and releasing) permission to use xattr log items.  The
implementation code doesn't belong in xfs_log.c, and calls to
logging functions don't belong in libxfs -- they really should be
done by the VFS implementation functions before they start calling
into libraries.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'guilt/xfs-5.19-recovery-buf-cancel' into xfs-5.19-for-next</title>
<updated>2022-05-30T00:58:13+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>david@fromorbit.com</email>
</author>
<published>2022-05-30T00:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=621dc801df4c2e523a66c96b250e0133ed7da90e'/>
<id>urn:sha1:621dc801df4c2e523a66c96b250e0133ed7da90e</id>
<content type='text'>
As part of solving the memory leaks and UAF problems in the new LARP
code, kmemleak also reported that log recovery will leak the table
used to hash buffer cancellations if the recovery fails.  Fix this
problem by creating alloc/free helpers that initialize and free the
hashtable contents correctly.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;
</content>
</entry>
<entry>
<title>xfs: move xfs_attr_use_log_assist usage out of libxfs</title>
<updated>2022-05-27T00:34:04+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-05-27T00:34:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=efc2efeba169ff37bbd425631985064365c614e0'/>
<id>urn:sha1:efc2efeba169ff37bbd425631985064365c614e0</id>
<content type='text'>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that sets up logged xattr updates out of libxfs and into
xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>xfs: move xfs_attr_use_log_assist out of xfs_log.c</title>
<updated>2022-05-27T00:33:29+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-05-27T00:33:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=d9c61ccb3b09d8f892cccbf662ce0c870f8e4ade'/>
<id>urn:sha1:d9c61ccb3b09d8f892cccbf662ce0c870f8e4ade</id>
<content type='text'>
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that enables logged xattr updates out of "lib"xlog and into
xfs_xattr.c so that it no longer has to know about xlog_* functions.

While we're at it, give xfs_xattr.c its own header file.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>xfs: convert buf_cancel_table allocation to kmalloc_array</title>
<updated>2022-05-27T00:27:19+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-05-27T00:27:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=910bbdf2f4d7df46781bc9b723048f5ebed3d0d7'/>
<id>urn:sha1:910bbdf2f4d7df46781bc9b723048f5ebed3d0d7</id>
<content type='text'>
While we're messing around with how recovery allocates and frees the
buffer cancellation table, convert the allocation to use kmalloc_array
instead of the old kmem_alloc APIs, and make it handle a null return,
even though that's not likely.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>xfs: refactor buffer cancellation table allocation</title>
<updated>2022-05-27T00:26:17+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@kernel.org</email>
</author>
<published>2022-05-27T00:26:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=2723234923b3294dbcf6019c288c87465e927ed4'/>
<id>urn:sha1:2723234923b3294dbcf6019c288c87465e927ed4</id>
<content type='text'>
Move the code that allocates and frees the buffer cancellation tables
used by log recovery into the file that actually uses the tables.  This
is a precursor to some cleanups and a memory leak fix.

Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
</feed>
