summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-01-17f2fs: fix deadlock allocating bio_post_read_ctx from mempoolEric Biggers
Without any form of coordination, any case where multiple allocations from the same mempool are needed at a time to make forward progress can deadlock under memory pressure. This is the case for struct bio_post_read_ctx, as one can be allocated to decrypt a Merkle tree page during fsverity_verify_bio(), which itself is running from a post-read callback for a data bio which has its own struct bio_post_read_ctx. Fix this by freeing first bio_post_read_ctx before calling fsverity_verify_bio(). This works because verity (if enabled) is always the last post-read step. This deadlock can be reproduced by trying to read from an encrypted verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching mempool_alloc() to pretend that pool->alloc() always fails. Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually hit this bug in practice would require reading from lots of encrypted verity files at the same time. But it's theoretically possible, as N available objects doesn't guarantee forward progress when > N/2 threads each need 2 objects at a time. Fixes: 95ae251fe828 ("f2fs: add fs-verity support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: remove unneeded check for error allocating bio_post_read_ctxEric Biggers
Since allocating an object from a mempool never fails when __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check for failure to allocate a bio_post_read_ctx is unnecessary. Remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: convert inline_dir early before starting renameJaegeuk Kim
If we hit an error during rename, we'll get two dentries in different directories. Chao adds to check the room in inline_dir which can avoid needless inversion. This should be done by inode_lock(&old_dir). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: fix memleak of kobjectChao Yu
If kobject_init_and_add() failed, caller needs to invoke kobject_put() to release kobject explicitly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: fix to add swap extent correctlyChao Yu
As Youling reported in mailing list: https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/ https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/ There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address. Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: run fsck when getting bad inode during GCJaegeuk Kim
This is to avoid inifinite GC when trying to disable checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: support data compressionChao Yu
This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ | cluster 1 | cluster 2 | ......... | cluster N | +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 | +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ | data length | data chksum | reserved | compressed data | +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: free sysfs kobjectJaegeuk Kim
Detected kmemleak. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: declare nested quota_sem and remove unnecessary semsJaegeuk Kim
1. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> dquot_writeback_dquots -> f2fs_dquot_commit -> down_read(&sbi->quota_sem) 2. f2fs_quota_sync -> down_read(&sbi->quota_sem) -> f2fs_write_data_pages -> f2fs_write_single_data_page -> down_write(&F2FS_I(inode)->i_sem) f2fs_mkdir -> f2fs_do_add_link -> down_write(&F2FS_I(inode)->i_sem) -> f2fs_init_inode_metadata -> f2fs_new_node_page -> dquot_alloc_inode -> f2fs_dquot_mark_dquot_dirty -> down_read(&sbi->quota_sem) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: don't put new_page twice in f2fs_renameJaegeuk Kim
In f2fs_rename(), new_page is gone after f2fs_set_link(), but it tries to put again when whiteout is failed and jumped to put_out_dir. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: set I_LINKABLE early to avoid wrong access by vfsJaegeuk Kim
This patch moves setting I_LINKABLE early in rename2(whiteout) to avoid the below warning. [ 3189.163385] WARNING: CPU: 3 PID: 59523 at fs/inode.c:358 inc_nlink+0x32/0x40 [ 3189.246979] Call Trace: [ 3189.248707] f2fs_init_inode_metadata+0x2d6/0x440 [f2fs] [ 3189.251399] f2fs_add_inline_entry+0x162/0x8c0 [f2fs] [ 3189.254010] f2fs_add_dentry+0x69/0xe0 [f2fs] [ 3189.256353] f2fs_do_add_link+0xc5/0x100 [f2fs] [ 3189.258774] f2fs_rename2+0xabf/0x1010 [f2fs] [ 3189.261079] vfs_rename+0x3f8/0xaa0 [ 3189.263056] ? tomoyo_path_rename+0x44/0x60 [ 3189.265283] ? do_renameat2+0x49b/0x550 [ 3189.267324] do_renameat2+0x49b/0x550 [ 3189.269316] __x64_sys_renameat2+0x20/0x30 [ 3189.271441] do_syscall_64+0x5a/0x230 [ 3189.273410] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3189.275848] RIP: 0033:0x7f270b4d9a49 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: don't keep META_MAPPING pages used for moving verity file blocksEric Biggers
META_MAPPING is used to move blocks for both encrypted and verity files. So the META_MAPPING invalidation condition in do_checkpoint() should consider verity too, not just encrypt. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: introduce private biosetChao Yu
In low memory scenario, we can allocate multiple bios without submitting any of them. - f2fs_write_checkpoint() - block_operations() - f2fs_sync_node_pages() step 1) flush cold nodes, allocate new bio from mempool - bio_alloc() - mempool_alloc() step 2) flush hot nodes, allocate a bio from mempool - bio_alloc() - mempool_alloc() step 3) flush warm nodes, be stuck in below call path - bio_alloc() - mempool_alloc() - loop to wait mempool element release, as we only reserved memory for two bio allocation, however above allocated two bios may never be submitted. So we need avoid using default bioset, in this patch we introduce a private bioset, in where we enlarg mempool element count to total number of log header, so that we can make sure we have enough backuped memory pool in scenario of allocating/holding multiple bios. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: cleanup duplicate stats for atomic filesSahitya Tummala
Remove duplicate sbi->aw_cnt stats counter that tracks the number of atomic files currently opened (it also shows incorrect value sometimes). Use more relit lable sbi->atomic_files to show in the stats. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: Check write pointer consistency of non-open zonesShin'ichiro Kawasaki
To catch f2fs bugs in write pointer handling code for zoned block devices, check write pointers of non-open zones that current segments do not point to. Do this check at mount time, after the fsync data recovery and current segments' write pointer consistency fix. Or when fsync data recovery is disabled by mount option, do the check when there is no fsync data. Check two items comparing write pointers with valid block maps in SIT. The first item is check for zones with no valid blocks. When there is no valid blocks in a zone, the write pointer should be at the start of the zone. If not, next write operation to the zone will cause unaligned write error. If write pointer is not at the zone start, reset the write pointer to place at the zone start. The second item is check between the write pointer position and the last valid block in the zone. It is unexpected that the last valid block position is beyond the write pointer. In such a case, report as a bug. Fix is not required for such zone, because the zone is not selected for next write operation until the zone get discarded. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: Check write pointer consistency of open zonesShin'ichiro Kawasaki
On sudden f2fs shutdown, write pointers of zoned block devices can go further but f2fs meta data keeps current segments at positions before the write operations. After remounting the f2fs, this inconsistency causes write operations not at write pointers and "Unaligned write command" error is reported. To avoid the error, compare current segments with write pointers of open zones the current segments point to, during mount operation. If the write pointer position is not aligned with the current segment position, assign a new zone to the current segment. Also check the newly assigned zone has write pointer at zone start. If not, reset write pointer of the zone. Perform the consistency check during fsync recovery. Not to lose the fsync data, do the check after fsync data gets restored and before checkpoint commit which flushes data at current segment positions. Not to cause conflict with kworker's dirfy data/node flush, do the fix within SBI_POR_DOING protection. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-12f2fs: set GFP_NOFS when moving inline dentriesJaegeuk Kim
Otherwise, it can cause circular locking dependency reported by mm. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-12f2fs: should avoid recursive filesystem opsJaegeuk Kim
We need to use GFP_NOFS, since we did f2fs_lock_op(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-12f2fs: keep quota data on write_begin failureJaegeuk Kim
This patch avoids some unnecessary locks for quota files when write_begin fails. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-10f2fs: call f2fs_balance_fs outside of locked pageJaegeuk Kim
Otherwise, we can hit deadlock by waiting for the locked page in move_data_block in GC. Thread A Thread B - do_page_mkwrite - f2fs_vm_page_mkwrite - lock_page - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - do_garbage_collect - ra_data_block - grab_cache_page - f2fs_balance_fs - mutex_lock(gc_mutex) Fixes: 39a8695824510 ("f2fs: refactor ->page_mkwrite() flow") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-09f2fs: preallocate DIO blocks when forcing buffered_ioJaegeuk Kim
The previous preallocation and DIO decision like below. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) No_Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO But, Javier reported Case (*) where zoned device bypassed preallocation but fell back to buffered writes in f2fs_direct_IO(), resulting in stale data being read. In order to fix the issue, actually we need to preallocate blocks whenever we fall back to buffered IO like this. No change is made in the other cases. allow_outplace_dio !allow_outplace_dio f2fs_force_buffered_io (*) Prealloc / Buffered_IO Prealloc / Buffered_IO !f2fs_force_buffered_io No_Prealloc / DIO Prealloc / DIO Reported-and-tested-by: Javier Gonzalez <javier@javigon.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-12-09btrfs: add Kconfig dependency for BLAKE2BDavid Sterba
Because the BLAKE2B code went through a different tree, it was not available at the time the btrfs part was merged. Now that the Kconfig symbol exists, add it to the list. Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-08Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Nine cifs/smb3 fixes: - one fix for stable (oops during oplock break) - two timestamp fixes including important one for updating mtime at close to avoid stale metadata caching issue on dirty files (also improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the wire) - two fixes for "modefromsid" mount option for file create (now allows mode bits to be set more atomically and accurately on create by adding "sd_context" on create when modefromsid specified on mount) - two fixes for multichannel found in testing this week against different servers - two small cleanup patches" * tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve check for when we send the security descriptor context on create smb3: fix mode passed in on create for modetosid mount option cifs: fix possible uninitialized access and race on iface_list cifs: Fix lookup of SMB connections on multichannel smb3: query attributes on file close smb3: remove unused flag passed into close functions cifs: remove redundant assignment to pointer pneg_ctxt fs: cifs: Fix atime update check vs mtime CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
2019-12-08Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs cleanups from Al Viro: "No common topic, just three cleanups". * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make __d_alloc() static fs/namespace: add __user to open_tree and move_mount syscalls fs/fnctl: fix missing __user in fcntl_rw_hint()
2019-12-07Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap fixes from Darrick Wong: "Fix a race condition and a use-after-free error: - Fix a UAF when reporting writeback errors - Fix a race condition when handling page uptodate on fragmented file with blocksize < pagesize" * tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: stop using ioend after it's been freed in iomap_finish_ioend() iomap: fix sub-page uptodate handling
2019-12-07Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Fix a couple of resource management errors and a hang: - fix a crash in the log setup code when log mounting fails - fix a hang when allocating space on the realtime device - fix a block leak when freeing space on the realtime device" * tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix mount failure crash on invalid iclog memory access xfs: don't check for AG deadlock for realtime files in bunmapi xfs: fix realtime file data space leak
2019-12-07Merge tag 'for-linus-5.5-ofs1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs update from Mike Marshall: "orangefs: posix open permission checking... Orangefs has no open, and orangefs checks file permissions on each file access. Posix requires that file permissions be checked on open and nowhere else. Orangefs-through-the-kernel needs to seem posix compliant. The VFS opens files, even if the filesystem provides no method. We can see if a file was successfully opened for read and or for write by looking at file->f_mode. When writes are flowing from the page cache, file is no longer available. We can trust the VFS to have checked file->f_mode before writing to the page cache. The mode of a file might change between when it is opened and IO commences, or it might be created with an arbitrary mode. We'll make sure we don't hit EACCES during the IO stage by using UID 0" [ This is "posixish", but not a great solution in the long run, since a proper secure network server shouldn't really trust the client like this. But proper and secure POSIX behavior requires an open method and a resulting cookie for IO of some kind, or similar. - Linus ] * tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: posix open permission checking...
2019-12-07Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "This is a relatively quiet cycle for nfsd, mainly various bugfixes. Possibly most interesting is Trond's fixes for some callback races that were due to my incomplete understanding of rpc client shutdown. Unfortunately at the last minute I've started noticing a new intermittent failure to send callbacks. As the logic seems basically correct, I'm leaving Trond's patches in for now, and hope to find a fix in the next week so I don't have to revert those patches" * tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits) nfsd: depend on CRYPTO_MD5 for legacy client tracking NFSD fixing possible null pointer derefering in copy offload nfsd: check for EBUSY from vfs_rmdir/vfs_unink. nfsd: Ensure CLONE persists data and metadata changes to the target file SUNRPC: Fix backchannel latency metrics nfsd: restore NFSv3 ACL support nfsd: v4 support requires CRYPTO_SHA256 nfsd: Fix cld_net->cn_tfm initialization lockd: remove __KERNEL__ ifdefs sunrpc: remove __KERNEL__ ifdefs race in exportfs_decode_fh() nfsd: Drop LIST_HEAD where the variable it declares is never used. nfsd: document callback_wq serialization of callback code nfsd: mark cb path down on unknown errors nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback() nfsd: minor 4.1 callback cleanup SUNRPC: Fix svcauth_gss_proxy_init() SUNRPC: Trace gssproxy upcall results sunrpc: fix crash when cache_head become valid before update nfsd: remove private bin2hex implementation ...
2019-12-07Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded copy of a file from one source server to a different target server). - New RDMA tracepoints for debugging congestion control and Local Invalidate WRs. Bugfixes and cleanups - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for layoutreturn - Handle bad/dead sessions correctly in nfs41_sequence_process() - Various bugfixes to the delegation return operation. - Various bugfixes pertaining to delegations that have been revoked. - Cleanups to the NFS timespec code to avoid unnecessary conversions between timespec and timespec64. - Fix unstable RDMA connections after a reconnect - Close race between waking an RDMA sender and posting a receive - Wake pending RDMA tasks if connection fails - Fix MR list corruption, and clean up MR usage - Fix another RPCSEC_GSS issue with MIC buffer space" * tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) SUNRPC: Capture completion of all RPC tasks SUNRPC: Fix another issue with MIC buffer space NFS4: Trace lock reclaims NFS4: Trace state recovery operation NFSv4.2 fix memory leak in nfs42_ssc_open NFSv4.2 fix kfree in __nfs42_copy_file_range NFS: remove duplicated include from nfs4file.c NFSv4: Make _nfs42_proc_copy_notify() static NFS: Fallocate should use the nfs4_fattr_bitmap NFS: Return -ETXTBSY when attempting to write to a swapfile fs: nfs: sysfs: Remove NULL check before kfree NFS: remove unneeded semicolon NFSv4: add declaration of current_stateid NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process() nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list SUNRPC: Avoid RPC delays when exiting suspend NFS: Add a tracepoint in nfs_fh_to_dentry() NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done() NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn ...
2019-12-07smb3: improve check for when we send the security descriptor context on createSteve French
We had cases in the previous patch where we were sending the security descriptor context on SMB3 open (file create) in cases when we hadn't mounted with with "modefromsid" mount option. Add check for that mount flag before calling ad_sd_context in open init. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-07pipe: don't use 'pipe_wait() for basic pipe IOLinus Torvalds
pipe_wait() may be simple, but since it relies on the pipe lock, it means that we have to do the wakeup while holding the lock. That's unfortunate, because the very first thing the waked entity will want to do is to get the pipe lock for itself. So get rid of the pipe_wait() usage by simply releasing the pipe lock, doing the wakeup (if required) and then using wait_event_interruptible() to wait on the right condition instead. wait_event_interruptible() handles races on its own by comparing the wakeup condition before and after adding itself to the wait queue, so you can use an optimistic unlocked condition for it. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07pipe: remove 'waiting_writers' merging logicLinus Torvalds
This code is ancient, and goes back to when we only had a single page for the pipe buffers. The exact history is hidden in the mists of time (ie "before git", and in fact predates the BK repository too). At that long-ago point in time, it actually helped to try to merge big back-and-forth pipe reads and writes, and not limit pipe reads to the single pipe buffer in length just because that was all we had at a time. However, since then we've expanded the pipe buffers to multiple pages, and this logic really doesn't seem to make sense. And a lot of it is somewhat questionable (ie "hmm, the user asked for a non-blocking read, but we see that there's a writer pending, so let's wait anyway to get the extra data that the writer will have"). But more importantly, it makes the "go to sleep" logic much less obvious, and considering the wakeup issues we've had, I want to make for less of those kinds of things. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07pipe: fix and clarify pipe read wakeup logicLinus Torvalds
This is the read side version of the previous commit: it simplifies the logic to only wake up waiting writers when necessary, and makes sure to use a synchronous wakeup. This time not so much for GNU make jobserver reasons (that pipe never fills up), but simply to get the writer going quickly again. A bit less verbose commentary this time, if only because I assume that the write side commentary isn't going to be ignored if you touch this code. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07pipe: fix and clarify pipe write wakeup logicLinus Torvalds
The pipe rework ends up having been extra painful, partly becaused of actual bugs with ordering and caching of the pipe state, but also because of subtle performance issues. In particular, the pipe rework caused the kernel build to inexplicably slow down. The reason turns out to be that the GNU make jobserver (which limits the parallelism of the build) uses a pipe to implement a "token" system: a parallel submake will read a character from the pipe to get the job token before starting a new job, and will write a character back to the pipe when it is done. The overall job limit is thus easily controlled by just writing the appropriate number of initial token characters into the pipe. But to work well, that really means that the old behavior of write wakeups being synchronous (WF_SYNC) is very important - when the pipe writer wakes up a reader, we want the reader to actually get scheduled immediately. Otherwise you lose the parallelism of the build. The pipe rework lost that synchronous wakeup on write, and we had clearly all forgotten the reasons and rules for it. This rewrites the pipe write wakeup logic to do the required Wsync wakeups, but also clarifies the logic and avoids extraneous wakeups. It also ends up addign a number of comments about what oit does and why, so that we hopefully don't end up forgetting about this next time we change this code. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07pipe: fix poll/select race introduced by the pipe reworkLinus Torvalds
The kernel wait queues have a basic rule to them: you add yourself to the wait-queue first, and then you check the things that you're going to wait on. That avoids the races with the event you're waiting for. The same goes for poll/select logic: the "poll_wait()" goes first, and then you check the things you're polling for. Of course, if you use locking, the ordering doesn't matter since the lock will serialize with anything that changes the state you're looking at. That's not the case here, though. So move the poll_wait() first in pipe_poll(), before you start looking at the pipe state. Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07nfsd: depend on CRYPTO_MD5 for legacy client trackingPatrick Steinhardt
The legacy client tracking infrastructure of nfsd makes use of MD5 to derive a client's recovery directory name. As the nfsd module doesn't declare any dependency on CRYPTO_MD5, though, it may fail to allocate the hash if the kernel was compiled without it. As a result, generation of client recovery directories will fail with the following error: NFSD: unable to generate recoverydir name The explicit dependency on CRYPTO_MD5 was removed as redundant back in 6aaa67b5f3b9 (NFSD: Remove redundant "select" clauses in fs/Kconfig 2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5. This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit df486a25900f (NFS: Fix the selection of security flavours in Kconfig) at a later point. Fix the issue by adding back an explicit dependency on CRYPTO_MD5. Fixes: df486a25900f (NFS: Fix the selection of security flavours in Kconfig) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-07NFSD fixing possible null pointer derefering in copy offloadOlga Kornievskaia
Static checker revealed possible error path leading to possible NULL pointer dereferencing. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e0639dc5805a: ("NFSD introduce async copy feature") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-06pipe: Fix iteration end check in fuse_dev_splice_write()David Howells
Fix the iteration end check in fuse_dev_splice_write(). The iterator position can only be compared with == or != since wrappage may be involved. Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-06pipe: fix incorrect caching of pipe state over pipe_wait()Linus Torvalds
Similarly to commit 8f868d68d335 ("pipe: Fix missing mask update after pipe_wait()") this fixes a case where the pipe rewrite ended up caching the pipe state incorrectly over a pipe lock drop event. It wasn't quite as obvious, because you needed to splice data from a pipe to a file, which is a fairly unusual operation, but it's completely wrong. Make sure we load the pipe head/tail/size information only after we've waited for there to be data in the pipe. While in that file, also make one of the splice helper functions use the canonical arghument order for pipe_empty(). That's syntactic - pipe emptiness is just that head and tail are equal, and thus mixing up head and tail doesn't really matter. It's still wrong, though. Reported-by: David Sterba <dsterba@suse.cz> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-06smb3: fix mode passed in on create for modetosid mount optionSteve French
When using the special SID to store the mode bits in an ACE (See http://technet.microsoft.com/en-us/library/hh509017(v=ws.10).aspx) which is enabled with mount parm "modefromsid" we were not passing in the mode via SMB3 create (although chmod was enabled). SMB3 create allows a security descriptor context to be passed in (which is more atomic and thus preferable to setting the mode bits after create via a setinfo). This patch enables setting the mode bits on create when using modefromsid mount option. In addition it fixes an endian error in the definition of the Control field flags in the SMB3 security descriptor. It also makes the ACE type of the special SID better match the documentation (and behavior of servers which use this to store mode bits in SMB3 ACLs). Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-06Merge tag 'for-linus-20191205' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block and io_uring updates from Jens Axboe: "I wasn't expecting this to be so big, and if I was, I would have used separate branches for this. Going forward I'll be doing separate branches for the current tree, just like for the next kernel version tree. In any case, this contains: - Series from Christoph that fixes an inherent race condition with zoned devices and revalidation. - null_blk zone size fix (Damien) - Fix for a regression in this merge window that caused busy spins by sending empty disk uevents (Eric) - Fix for a regression in this merge window for bfq stats (Hou) - Fix for io_uring creds allocation failure handling (me) - io_uring -ERESTARTSYS send/recvmsg fix (me) - Series that fixes the need for applications to retain state across async request punts for io_uring. This one is a bit larger than I would have hoped, but I think it's important we get this fixed for 5.5. - connect(2) improvement for io_uring, handling EINPROGRESS instead of having applications needing to poll for it (me) - Have io_uring use a hash for poll requests instead of an rbtree. This turned out to work much better in practice, so I think we should make the switch now. For some workloads, even with a fair amount of cancellations, the insertion sort is just too expensive. (me) - Various little io_uring fixes (me, Jackie, Pavel, LimingWu) - Fix for brd unaligned IO, and a warning for the future (Ming) - Fix for a bio integrity data leak (Justin) - bvec_iter_advance() improvement (Pavel) - Xen blkback page unmap fix (SeongJae) The major items in here are all well tested, and on the liburing side we continue to add regression and feature test cases. We're up to 50 topic cases now, each with anywhere from 1 to more than 10 cases in each" * tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits) block: fix memleak of bio integrity data io_uring: fix a typo in a comment bfq-iosched: Ensure bio->bi_blkg is valid before using it io_uring: hook all linked requests via link_list io_uring: fix error handling in io_queue_link_head io_uring: use hash table for poll command lookups io-wq: clear node->next on list deletion io_uring: ensure deferred timeouts copy necessary data io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED brd: warn on un-aligned buffer brd: remove max_hw_sectors queue limit xen/blkback: Avoid unmapping unmapped grant pages io_uring: handle connect -EINPROGRESS like -EAGAIN block: set the zone size in blk_revalidate_disk_zones atomically block: don't handle bio based drivers in blk_revalidate_disk_zones block: allocate the zone bitmaps lazily block: replace seq_zones_bitmap with conv_zones_bitmap block: simplify blkdev_nr_zones block: remove the empty line at the end of blk-zoned.c ...
2019-12-06Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro: "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use. Basically, the problem is that negative pinned dentries require careful treatment - unless ->d_lock is locked or parent is held at least shared, another thread can make them positive right under us. Most of the uses turned out to be safe - the main surprises as far as filesystems are concerned were - race in dget_parent() fastpath, that might end up with the caller observing the returned dentry _negative_, due to insufficient barriers. It is positive in memory, but we could end up seeing the wrong value of ->d_inode in CPU cache. Fixed. - manual checks that result of lookup_one_len_unlocked() is positive (and rejection of negatives). Again, insufficient barriers (we might end up with inconsistent observed values of ->d_inode and ->d_flags). Fixed by switching to a new primitive that does the checks itself and returns ERR_PTR(-ENOENT) instead of a negative dentry. That way we get rid of boilerplate converting negatives into ERR_PTR(-ENOENT) in the callers and have a single place to deal with the barrier-related mess - inside fs/namei.c rather than in every caller out there. The guts of pathname resolution *do* need to be careful - the race found by Ritesh is real, as well as several similar races. Fortunately, it turns out that we can take care of that with fairly local changes in there. The tree-wide audit had not been fun, and I hate the idea of repeating it. I think the right approach would be to annotate the places where we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse catch regressions. But I'm still not sure what would be the least invasive way of doing that and it's clearly the next cycle fodder" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/namei.c: fix missing barriers when checking positivity fix dget_parent() fastpath race new helper: lookup_positive_unlocked() fs/namei.c: pull positivity check into follow_managed()
2019-12-05Merge branch 'next.autofs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull autofs updates from Al Viro: "autofs misuses checks for ->d_subdirs emptiness; the cursors are in the same lists, resulting in false negatives. It's not needed anyway, since autofs maintains counter in struct autofs_info, containing 0 for removed ones, 1 for live symlinks and 1 + number of children for live directories, which is precisely what we need for those checks. This series switches to use of that counter and untangles the crap around its uses (it needs not be atomic and there's a bunch of completely pointless "defensive" checks). This fell out of dcache_readdir work; the main point is to get rid of ->d_subdirs abuses in there. I've more followup cleanups, but I hadn't run those by Ian yet, so they can go next cycle" * 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: autofs: don't bother with atomics for ino->count autofs_dir_rmdir(): check ino->count for deciding whether it's empty... autofs: get rid of pointless checks around ->count handling autofs_clear_leaf_automount_flags(): use ino->count instead of ->d_subdirs
2019-12-05Merge branch 'pipe-rework' (patches from David Howells)Linus Torvalds
Merge two fixes for the pipe rework from David Howells: "Here are a couple of patches to fix bugs syzbot found in the pipe changes: - An assertion check will sometimes trip when polling a pipe because the ring size and indices used are approximate and may be being changed simultaneously. An equivalent approximate calculation was done previously, but without the assertion check, so I've just dropped the check. To make it accurate, the pipe mutex would need to be taken or the spin lock could be used - but usage of the spinlock would need to be rolled out into splice, iov_iter and other places for that. - The index mask and the max_usage values cannot be cached across pipe_wait() as F_SETPIPE_SZ could have been called during the wait. This can cause pipe_write() to break" * pipe-rework: pipe: Fix missing mask update after pipe_wait() pipe: Remove assertion from pipe_poll()
2019-12-05pipe: Fix missing mask update after pipe_wait()David Howells
Fix pipe_write() to not cache the ring index mask and max_usage as their values are invalidated by calling pipe_wait() because the latter function drops the pipe lock, thereby allowing F_SETPIPE_SZ change them. Without this, pipe_write() may subsequently miscalculate the array indices and pipe fullness, leading to an oops like the following: BUG: KASAN: slab-out-of-bounds in pipe_write+0xc25/0xe10 fs/pipe.c:481 Write of size 8 at addr ffff8880771167a8 by task syz-executor.3/7987 ... CPU: 1 PID: 7987 Comm: syz-executor.3 Not tainted 5.4.0-rc2-syzkaller #0 ... Call Trace: pipe_write+0xc25/0xe10 fs/pipe.c:481 call_write_iter include/linux/fs.h:1895 [inline] new_sync_write+0x3fd/0x7e0 fs/read_write.c:483 __vfs_write+0x94/0x110 fs/read_write.c:496 vfs_write+0x18a/0x520 fs/read_write.c:558 ksys_write+0x105/0x220 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x6e/0xb0 fs/read_write.c:620 do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is not a problem for pipe_read() as the mask is recalculated on each pass of the loop, after pipe_wait() has been called. Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Reported-by: syzbot+838eb0878ffd51f27c41@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Cc: Eric Biggers <ebiggers@kernel.org> [ Changed it to use a temporary variable 'mask' to avoid long lines -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05pipe: Remove assertion from pipe_poll()David Howells
An assertion check was added to pipe_poll() to make sure that the ring occupancy isn't seen to overflow the ring size. However, since no locks are held when the three values are read, it is possible for F_SETPIPE_SZ to intervene and muck up the calculation, thereby causing the oops. Fix this by simply removing the assertion and accepting that the calculation might be approximate. Note that the previous code also had a similar issue, though there was no assertion check, since the occupancy counter and the ring size were not read with a lock held, so it's possible that the poll check might have malfunctioned then too. Also wake up all the waiters so that they can reissue their checks if there was a competing read or write. Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length") Reported-by: syzbot+d37abaade33a934f16f2@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05Merge tag 'gfs2-for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Andreas Gruenbacher: "Bob's extensive filesystem withdrawal and recovery testing: - don't write log headers after file system withdraw - clean up iopen glock mess in gfs2_create_inode - close timing window with GLF_INVALIDATE_IN_PROGRESS - abort gfs2_freeze if io error is seen - don't loop forever in gfs2_freeze if withdrawn - fix infinite loop in gfs2_ail1_flush on io error - introduce function gfs2_withdrawn - fix glock reference problem in gfs2_trans_remove_revoke Filesystems with a block size smaller than the page size: - fix end-of-file handling in gfs2_page_mkwrite - improve mmap write vs. punch_hole consistency Other: - remove active journal side effect from gfs2_write_log_header - multi-block allocations in gfs2_page_mkwrite Minor cleanups and coding style fixes: - remove duplicate call from gfs2_create_inode - make gfs2_log_shutdown static - make gfs2_fs_parameters static - some whitespace cleanups - removed unnecessary semicolon" * tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Don't write log headers after file system withdraw gfs2: Remove duplicate call from gfs2_create_inode gfs2: clean up iopen glock mess in gfs2_create_inode gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS gfs2: Abort gfs2_freeze if io error is seen gfs2: Don't loop forever in gfs2_freeze if withdrawn gfs2: fix infinite loop in gfs2_ail1_flush on io error gfs2: Introduce function gfs2_withdrawn gfs2: fix glock reference problem in gfs2_trans_remove_revoke gfs2: make gfs2_log_shutdown static gfs2: Remove active journal side effect from gfs2_write_log_header gfs2: Fix end-of-file handling in gfs2_page_mkwrite gfs2: Multi-block allocations in gfs2_page_mkwrite gfs2: Improve mmap write vs. punch_hole consistency gfs2: make gfs2_fs_parameters static gfs2: Some whitespace cleanups gfs2: removed unnecessary semicolon
2019-12-05Merge tag 'ceph-for-5.5-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The two highlights are a set of improvements to how rbd read-only mappings are handled and a conversion to the new mount API (slightly complicated by the fact that we had a common option parsing framework that called out into rbd and the filesystem instead of them calling into it). Also included a few scattered fixes and a MAINTAINERS update for rbd, adding Dongsheng as a reviewer" * tag 'ceph-for-5.5-rc1' of git://github.com/ceph/ceph-client: libceph, rbd, ceph: convert to use the new mount API rbd: ask for a weaker incompat mask for read-only mappings rbd: don't query snapshot features rbd: remove snapshot existence validation code rbd: don't establish watch for read-only mappings rbd: don't acquire exclusive lock for read-only mappings rbd: disallow read-write partitions on images mapped read-only rbd: treat images mapped read-only seriously rbd: introduce RBD_DEV_FLAG_READONLY rbd: introduce rbd_is_snap() ceph: don't leave ino field in ceph_mds_request_head uninitialized ceph: tone down loglevel on ceph_mdsc_build_path warning rbd: update MAINTAINERS info ceph: fix geting random mds from mdsmap rbd: fix spelling mistake "requeueing" -> "requeuing" ceph: make several helper accessors take const pointers libceph: drop unnecessary check from dispatch() in mon_client.c
2019-12-05Merge tag 'fuse-update-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: - Fix a regression introduced in the last release - Fix a number of issues with validating data coming from userspace - Some cleanups in virtiofs * tag 'fuse-update-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix Kconfig indentation fuse: fix leak of fuse_io_priv virtiofs: Use completions while waiting for queue to be drained virtiofs: Do not send forget request "struct list_head" element virtiofs: Use a common function to send forget virtiofs: Fix old-style declaration fuse: verify nlink fuse: verify write return fuse: verify attributes
2019-12-05iomap: stop using ioend after it's been freed in iomap_finish_ioend()Zorro Lang
This patch fixes the following KASAN report. The @ioend has been freed by dio_put(), but the iomap_finish_ioend() still trys to access its data. [20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0 [20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184 [20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1 [20563.653887] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 [20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs] [20563.669547] Call trace: [20563.671993] dump_backtrace+0x0/0x370 [20563.675648] show_stack+0x1c/0x28 [20563.678958] dump_stack+0x138/0x1b0 [20563.682455] print_address_description.isra.9+0x60/0x378 [20563.687759] __kasan_report+0x1a4/0x2a8 [20563.691587] kasan_report+0xc/0x18 [20563.694985] __asan_report_load8_noabort+0x18/0x20 [20563.699769] iomap_finish_ioend+0x58c/0x5c0 [20563.703944] iomap_finish_ioends+0x110/0x270 [20563.708396] xfs_end_ioend+0x168/0x598 [xfs] [20563.712823] xfs_end_io+0x1e0/0x2d0 [xfs] [20563.716834] process_one_work+0x7f0/0x1ac8 [20563.720922] worker_thread+0x334/0xae0 [20563.724664] kthread+0x2c4/0x348 [20563.727889] ret_from_fork+0x10/0x18 [20563.732941] Allocated by task 83403: [20563.736512] save_stack+0x24/0xb0 [20563.739820] __kasan_kmalloc.isra.9+0xc4/0xe0 [20563.744169] kasan_slab_alloc+0x14/0x20 [20563.747998] slab_post_alloc_hook+0x50/0xa8 [20563.752173] kmem_cache_alloc+0x154/0x330 [20563.756185] mempool_alloc_slab+0x20/0x28 [20563.760186] mempool_alloc+0xf4/0x2a8 [20563.763845] bio_alloc_bioset+0x2d0/0x448 [20563.767849] iomap_writepage_map+0x4b8/0x1740 [20563.772198] iomap_do_writepage+0x200/0x8d0 [20563.776380] write_cache_pages+0x8a4/0xed8 [20563.780469] iomap_writepages+0x4c/0xb0 [20563.784463] xfs_vm_writepages+0xf8/0x148 [xfs] [20563.788989] do_writepages+0xc8/0x218 [20563.792658] __writeback_single_inode+0x168/0x18f8 [20563.797441] writeback_sb_inodes+0x370/0xd30 [20563.801703] wb_writeback+0x2d4/0x1270 [20563.805446] wb_workfn+0x344/0x1178 [20563.808928] process_one_work+0x7f0/0x1ac8 [20563.813016] worker_thread+0x334/0xae0 [20563.816757] kthread+0x2c4/0x348 [20563.819979] ret_from_fork+0x10/0x18 [20563.825028] Freed by task 22184: [20563.828251] save_stack+0x24/0xb0 [20563.831559] __kasan_slab_free+0x10c/0x180 [20563.835648] kasan_slab_free+0x10/0x18 [20563.839389] slab_free_freelist_hook+0xb4/0x1c0 [20563.843912] kmem_cache_free+0x8c/0x3e8 [20563.847745] mempool_free_slab+0x20/0x28 [20563.851660] mempool_free+0xd4/0x2f8 [20563.855231] bio_free+0x33c/0x518 [20563.858537] bio_put+0xb8/0x100 [20563.861672] iomap_finish_ioend+0x168/0x5c0 [20563.865847] iomap_finish_ioends+0x110/0x270 [20563.870328] xfs_end_ioend+0x168/0x598 [xfs] [20563.874751] xfs_end_io+0x1e0/0x2d0 [xfs] [20563.878755] process_one_work+0x7f0/0x1ac8 [20563.882844] worker_thread+0x334/0xae0 [20563.886584] kthread+0x2c4/0x348 [20563.889804] ret_from_fork+0x10/0x18 [20563.894855] The buggy address belongs to the object at fffffc0c54a36900 which belongs to the cache bio-1 of size 248 [20563.906844] The buggy address is located 40 bytes inside of 248-byte region [fffffc0c54a36900, fffffc0c54a369f8) [20563.918485] The buggy address belongs to the page: [20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300 [20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900 [20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000 [20563.948300] page dumped because: kasan: bad access detected [20563.955345] Memory state around the buggy address: [20563.960129] fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [20563.967342] fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [20563.981766] ^ [20563.986288] fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [20563.993501] fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [20564.000713] ================================================================== Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703 Signed-off-by: Zorro Lang <zlang@redhat.com> Fixes: 9cd0ed63ca514 ("iomap: enhance writeback error message") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>