<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs, branch vm-bind</title>
<subtitle>Linux Kernel</subtitle>
<id>https://git.etezian.org/cgit.cgi/linux.git/atom?h=vm-bind</id>
<link rel='self' href='https://git.etezian.org/cgit.cgi/linux.git/atom?h=vm-bind'/>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/'/>
<updated>2022-07-16T20:48:55+00:00</updated>
<entry>
<title>Merge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2022-07-16T20:48:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-07-16T20:48:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=972a278fe60c361eb8f37619f562f092e8786d7c'/>
<id>urn:sha1:972a278fe60c361eb8f37619f562f092e8786d7c</id>
<content type='text'>
Pull btrfs reverts from David Sterba:
 "Due to a recent report [1] we need to revert the radix tree to xarray
  conversion patches.

  There's a problem with sleeping under spinlock, when xa_insert could
  allocate memory under pressure. We use GFP_NOFS so this is a real
  problem that we unfortunately did not discover during review.

  I'm sorry to do such change at rc6 time but the revert is IMO the
  safer option, there are patches to use mutex instead of the spin locks
  but that would need more testing. The revert branch has been tested on
  a few setups, all seem ok.

  The conversion to xarray will be revisited in the future"

Link: https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/ [1]

* tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: turn delayed_nodes_tree into an XArray"
  Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"
  Revert "btrfs: turn fs_info member buffer_radix into XArray"
  Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"
</content>
</entry>
<entry>
<title>Revert "btrfs: turn delayed_nodes_tree into an XArray"</title>
<updated>2022-07-15T17:15:19+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2022-07-15T11:59:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=088aea3b97e0ae5a2a86f5d142ad10fec8a1b80f'/>
<id>urn:sha1:088aea3b97e0ae5a2a86f5d142ad10fec8a1b80f</id>
<content type='text'>
This reverts commit 253bf57555e451dec5a7f09dc95d380ce8b10e5b.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/

Reported-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"</title>
<updated>2022-07-15T17:14:58+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2022-07-15T11:59:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=5b8418b84303d9a0a0f7f28d6eaed915247ebdc3'/>
<id>urn:sha1:5b8418b84303d9a0a0f7f28d6eaed915247ebdc3</id>
<content type='text'>
This reverts commit 4076942021fe14efecae33bf98566df6dd5ae6f7.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/

Reported-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Revert "btrfs: turn fs_info member buffer_radix into XArray"</title>
<updated>2022-07-15T17:14:33+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2022-07-15T11:59:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=01cd390903e00c8f42ba0e84f25a70e3d613a15c'/>
<id>urn:sha1:01cd390903e00c8f42ba0e84f25a70e3d613a15c</id>
<content type='text'>
This reverts commit 8ee922689d67b7cfa6acbe2aa1ee76ac72e6fc8a.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/

Reported-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"</title>
<updated>2022-07-15T17:14:28+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2022-07-15T11:59:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=fc7cbcd4890e297de5d6487e04344a99b39de9be'/>
<id>urn:sha1:fc7cbcd4890e297de5d6487e04344a99b39de9be</id>
<content type='text'>
This reverts commit 48b36a602a335c184505346b5b37077840660634.

Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.

Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.

[1] https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/

Reported-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2022-07-11T21:41:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-07-11T21:41:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=5a29232d870d9e63fe5ff30b081be6ea7cc2465d'/>
<id>urn:sha1:5a29232d870d9e63fe5ff30b081be6ea7cc2465d</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "A more fixes that seem to me to be important enough to get merged
  before release:

   - in zoned mode, fix leak of a structure when reading zone info, this
     happens on normal path so this can be significant

   - in zoned mode, revert an optimization added in 5.19-rc1 to finish a
     zone when the capacity is full, but this is not reliable in all
     cases

   - try to avoid short reads for compressed data or inline files when
     it's a NOWAIT read, applications should handle that but there are
     two, qemu and mariadb, that are affected"

* tag 'for-5.19-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: drop optimization of zone finish
  btrfs: zoned: fix a leaked bioc in read_zone_info
  btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
</content>
</entry>
<entry>
<title>btrfs: zoned: drop optimization of zone finish</title>
<updated>2022-07-08T17:18:00+00:00</updated>
<author>
<name>Naohiro Aota</name>
<email>naohiro.aota@wdc.com</email>
</author>
<published>2022-06-29T02:00:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=b3a3b0255797e1d395253366ba24a4cc6c8bdf9c'/>
<id>urn:sha1:b3a3b0255797e1d395253366ba24a4cc6c8bdf9c</id>
<content type='text'>
We have an optimization in do_zone_finish() to send REQ_OP_ZONE_FINISH only
when necessary, i.e. we don't send REQ_OP_ZONE_FINISH when we assume we
wrote fully into the zone.

The assumption is determined by "alloc_offset == capacity". This condition
won't work if the last ordered extent is canceled due to some errors. In
that case, we consider the zone is deactivated without sending the finish
command while it's still active.

This inconstancy results in activating another block group while we cannot
really activate the underlying zone, which causes the active zone exceeds
errors like below.

    BTRFS error (device nvme3n2): allocation failed flags 1, wanted 520192 tree-log 0, relocation: 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0
    nvme3n2: I/O Cmd(0x7d) @ LBA 160432128, 127 blocks, I/O Error (sct 0x1 / sc 0xbd) MORE DNR
    active zones exceeded error, dev nvme3n2, sector 0 op 0xd:(ZONE_APPEND) flags 0x4800 phys_seg 1 prio class 0

Fix the issue by removing the optimization for now.

Fixes: 8376d9e1ed8f ("btrfs: zoned: finish superblock zone once no space left for new SB")
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Naohiro Aota &lt;naohiro.aota@wdc.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: zoned: fix a leaked bioc in read_zone_info</title>
<updated>2022-07-08T17:13:32+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-06-30T16:03:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=2963457829decf0c824a443238d251151ed18ff5'/>
<id>urn:sha1:2963457829decf0c824a443238d251151ed18ff5</id>
<content type='text'>
The bioc would leak on the normal completion path and also on the RAID56
check (but that one won't happen in practice due to the invalid
combination with zoned mode).

Fixes: 7db1c5d14dcd ("btrfs: zoned: support dev-replace in zoned filesystems")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
[ update changelog ]
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents</title>
<updated>2022-07-08T17:13:22+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2022-07-04T11:42:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=a4527e1853f8ff6e0b7c2dadad6268bd38427a31'/>
<id>urn:sha1:a4527e1853f8ff6e0b7c2dadad6268bd38427a31</id>
<content type='text'>
When doing a direct IO read or write, we always return -ENOTBLK when we
find a compressed extent (or an inline extent) so that we fallback to
buffered IO. This however is not ideal in case we are in a NOWAIT context
(io_uring for example), because buffered IO can block and we currently
have no support for NOWAIT semantics for buffered IO, so if we need to
fallback to buffered IO we should first signal the caller that we may
need to block by returning -EAGAIN instead.

This behaviour can also result in short reads being returned to user
space, which although it's not incorrect and user space should be able
to deal with partial reads, it's somewhat surprising and even some popular
applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't
deal with short reads properly (or at all).

The short read case happens when we try to read from a range that has a
non-compressed and non-inline extent followed by a compressed extent.
After having read the first extent, when we find the compressed extent we
return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to
treat the request as a short read, returning 0 (success) and waiting for
previously submitted bios to complete (this happens at
fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at
btrfs_file_read_iter(), we call filemap_read() to use buffered IO to
read the remaining data, and pass it the number of bytes we were able to
read with direct IO. Than at filemap_read() if we get a page fault error
when accessing the read buffer, we return a partial read instead of an
-EFAULT error, because the number of bytes previously read is greater
than zero.

So fix this by returning -EAGAIN for NOWAIT direct IO when we find a
compressed or an inline extent.

Reported-by: Dominique MARTINET &lt;dominique.martinet@atmark-techno.com&gt;
Link: https://lore.kernel.org/linux-btrfs/YrrFGO4A1jS0GI0G@atmark-techno.com/
Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&amp;page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582
Tested-by: Dominique MARTINET &lt;dominique.martinet@atmark-techno.com&gt;
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2022-06-26T17:11:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-26T17:11:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=82708bb1eb9ebc2d1e296f2c919685761f2fa8dd'/>
<id>urn:sha1:82708bb1eb9ebc2d1e296f2c919685761f2fa8dd</id>
<content type='text'>
Pull btrfs fixes from David Sterba:

 - zoned relocation fixes:
      - fix critical section end for extent writeback, this could lead
        to out of order write
      - prevent writing to previous data relocation block group if space
        gets low

 - reflink fixes:
      - fix race between reflinking and ordered extent completion
      - proper error handling when block reserve migration fails
      - add missing inode iversion/mtime/ctime updates on each iteration
        when replacing extents

 - fix deadlock when running fsync/fiemap/commit at the same time

 - fix false-positive KCSAN report regarding pid tracking for read locks
   and data race

 - minor documentation update and link to new site

* tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Documentation: update btrfs list of features and link to readthedocs.io
  btrfs: fix deadlock with fsync+fiemap+transaction commit
  btrfs: don't set lock_owner when locking extent buffer for reading
  btrfs: zoned: fix critical section of relocation inode writeback
  btrfs: zoned: prevent allocation from previous data relocation BG
  btrfs: do not BUG_ON() on failure to migrate space when replacing extents
  btrfs: add missing inode updates on each iteration when replacing extents
  btrfs: fix race between reflinking and ordered extent completion
</content>
</entry>
</feed>
