summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-11-30btrfs: don't access the bio directly in the direct I/O codeChristoph Hellwig
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: don't access the bio directly in the raid5/6 codeChristoph Hellwig
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: use bio iterators for the decompression handlersChristoph Hellwig
Pass the full bio to the decompression routines and use bio iterators to iterate over the data in the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_spaceJeff Mahoney
This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount with qgroups enabled. I was able to reproduce this by setting up a small (~500kb) quota limit and writing a file one byte at a time until I hit the limit. The warnings would all hit on umount. The root cause is that we would reserve a block-sized range in both the reservation and the quota in btrfs_check_data_free_space, but if we encountered a problem (like e.g. EDQUOT), we would only release the single byte in the qgroup reservation. That caused an iotree state split, which increased the number of outstanding extents, in turn disallowing releasing the metadata reservation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30Btrfs: abort transaction if fill_holes() failsJosef Bacik
At this point we will have dropped extent entries from the file, so if we fail to insert the new hole entries then we are leaving the fs in a corrupt state (albeit an easily fixed one). Abort the transaciton if this happens so we can avoid corrupting the fs. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30Btrfs: fix file extent corruptionJosef Bacik
In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: increment ctx->pos for every emitted or skipped dirent in readdirJeff Mahoney
If we process the last item in the leaf and hit an I/O error while reading the next leaf, we return -EIO without having adjusted the position. Since we have emitted dirents, getdents() will return the byte count to the user instead of the error. Subsequent callers will emit the last successful dirent again, and return -EIO again, with the same result. Callers loop forever. Instead, if we always increment ctx->pos after emitting or skipping the dirent, we'll be sure that we won't hit the same one again. When we go to process the next leaf, we won't have emitted any dirents and the -EIO will be returned to the user properly. We also don't need to track if we've emitted a dirent already or if we've changed the position yet. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove old tree_root dirent processing in btrfs_real_readdir()Jeff Mahoney
Commit 3de4586c527 (Btrfs: Allow subvolumes and snapshots anywhere in the directory tree) introduced the current system of placing snapshots in the directory tree. It also introduced the behavior of creating the snapshot and then creating the directory entries for it. We've kept this code around for compatibility reasons, but it turns out that no file systems with the old tree_root based snapshots can be mounted on newer (>= 2009) kernels anyway. About a month after the above commit, commit 2a7108ad89e (Btrfs: rev the disk format for the inode compat and csum selection changes) landed, changing the superblock magic number. As a result, we know that we'll never encounter tree_root-based dirents or have to deal with skipping our own snapshot dirents. Since that also means that we're now only iterating over DIR_INDEX items, which only contain one directory entry per leaf item, we don't need to loop over the leaf item contents anymore either. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: Call kunmap if zlib_inflateInit2 failsNick Terrell
If zlib_inflateInit2 fails, the input page is never unmapped. Add a call to kunmap when it fails. Signed-off-by: Nick Terrell <nickrterrell@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: store and load values of stripes_min/stripes_max in balance status itemDavid Sterba
The balance status item contains currently known filter values, but the stripes filter was unintentionally not among them. This would mean, that interrupted and automatically restarted balance does not apply the stripe filters. Fixes: dee32d0ac3719ef8d640efaf0884111df444730f CC: stable@vger.kernel.org # 4.4+ Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove redundant check of btrfs_iget return valueChristophe JAILLET
'btrfs_iget()' can not return NULL, so this test can be removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: change btrfs_csum_final result param type to u8Domagoj Tršan
csum member of struct btrfs_super_block has array type of u8. It makes sense that function btrfs_csum_final should be also declared to accept u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc, char *result); to void btrfs_csum_final(u32 crc, u8 *result); Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com> [ changed cast to u8 at several call sites ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30Btrfs: adjust len of writes if following a preallocated extentLiu Bo
If we have |0--hole--4095||4096--preallocate--12287| instead of using preallocated space, a 8K direct write will just create a new 8K extent and it'll end up with |0--new extent--8191||8192--preallocate--12287| It's because we find a hole em and then go to create a new 8K extent directly without adjusting @len. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: return early from failed memory allocations in ioctl handlersShailendra Verma
There is no need to call kfree() if memdup_user() fails, as no memory was allocated and the error in the error-valued pointer should be returned. Signed-off-by: Shailendra Verma <shailendra.v@samsung.com> [ edit subject ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: add optimized version of eb to eb copyDavid Sterba
Using copy_extent_buffer is suitable for copying betwenn buffers from an arbitrary offset and deals with page boundaries. This is not necessary when doing a full extent_buffer-to-extent_buffer copy. We can utilize the copy_page helper as well. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove constant parameter to memset_extent_buffer and rename itDavid Sterba
The only memset we do is to 0, so sink the parameter to the function and simplify all calls. Rename the function to reflect the behaviour. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: use specialized page copying helpers in btrfs_clone_extent_bufferDavid Sterba
The copy_page is usually optimized and can be faster than memcpy. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: use new helpers to set uuids in ebDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: introduce helpers for updating eb uuidsDavid Sterba
The fsid and chunk tree uuid are always located in the first page, we don't need the to use write_extent_buffer. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: delete unused member from superblockDavid Sterba
__bdev' has never been used since 0b86a832a1f38abec695864ec2eaedc9d2383f1b (2008). Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove trivial helper btrfs_find_tree_blockDavid Sterba
During the time, the function has been shrunk to the point that it just calls find_extent_buffer, just passing the parameters. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, remove pointless BUG_ON check for fs_infoDavid Sterba
We dereference fs_info several times, besides that post-mount functions should never see a NULL fs_info. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, remove pointless BUG_ON in reada_find_extentDavid Sterba
The lock is held, we make the same lookup that previously failed with EEXIST and we don't insert NULL pointers. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, sink start parameter to btree_readahead_hookDavid Sterba
Originally, the eb and start were passed separately in case eb is NULL. Since the readahead has been refactored in 4.6, this is not true anymore and we can get rid of the parameter. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, remove unused parameter from __readahead_hookDavid Sterba
'start' is not used since "btrfs: reada: Pass reada_extent into __readahead_hook directly" (6e39dbe8b9e55280c). Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: reada, cleanup remove unneeded variable in __readahead_hookDavid Sterba
We can't touch the eb directly in case the function is called with a non-zero error, so we can read the eb level when needed. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: rename helper macros for qgroup and aux data castsDavid Sterba
The helpers are not meant to be generic, the name is misleading. Convert them to static inlines for type checking. Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove stale comment from btrfs_statfsDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove unused headers, statfs.hDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: remove useless commentsXiaoguang Wang
Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely") Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: make block group flags in balance printks human-readableAdam Borowski
They're not even documented anywhere, letting users with no recourse but to RTFS. It's no big burden to output the bitfield as words. Also, display unknown flags as hex. Signed-off-by: Adam Borowski <kilobyte@angband.pl> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30Btrfs: deal with existing encompassing extent map in btrfs_get_extent()Omar Sandoval
My QEMU VM was seeing inexplicable I/O errors that I tracked down to errors coming from the qcow2 virtual drive in the host system. The qcow2 file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT. Every once in awhile, pread() or pwrite() would return EEXIST, which makes no sense. This turned out to be a bug in btrfs_get_extent(). Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where two threads race on adding the same extent map to an inode's extent map tree. However, if the added em is merged with an adjacent em in the extent tree, then we'll end up with an existing extent that is not identical to but instead encompasses the extent we tried to add. When we call merge_extent_mapping() to find the nonoverlapping part of the new em, the arithmetic overflows because there is no such thing. We then end up trying to add a bogus em to the em_tree, which results in a EEXIST that can bubble all the way up to userspace. Fix it by extending the identical extent map special case. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30btrfs: add necessary comments about tickets_idWang Xiaoguang
Tickets_id's name may result in some misunderstandings, it just indicates the next ticket will be handled and is not stored per ticket. Fixes: ce12965 ("btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress") Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-29btrfs: cleanup: use already calculated value in ↵Wang Xiaoguang
btrfs_should_throttle_delayed_refs() Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-29btrfs: don't abuse REQ_OP_* flags for btrfs_map_blockChristoph Hellwig
btrfs_map_block supports different types of mappings, which to a large extent resemble block layer operations. But they don't always do, and currently btrfs dangerously overlays it's own flag over the block layer flags. This is just asking for a conflict, so introduce a different map flags enum inside of btrfs instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-26fix default_file_splice_read()Al Viro
Botched calculation of number of pages. As the result, we were dropping pieces when doing splice to pipe from e.g. 9p. Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-25Btrfs: fix qgroup rescan worker initializationFilipe Manana
We were setting the qgroup_rescan_running flag to true only after the rescan worker started (which is a task run by a queue). So if a user space task starts a rescan and immediately after asks to wait for the rescan worker to finish, this second call might happen before the rescan worker task starts running, in which case the rescan wait ioctl returns immediatley, not waiting for the rescan worker to finish. This was making the fstest btrfs/022 fail very often. Fixes: d2c609b834d6 (btrfs: properly track when rescan worker is running) Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
2016-11-23Merge tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "Most of these fix regressions or races, but there is one patch for stable that Arnd sent me Stable bugfix: - Hide array-bounds warning Bugfixes: - Keep a reference on lock states while checking - Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state - Don't call close if the open stateid has already been cleared - Fix CLOSE rases with OPEN - Fix a regression in DELEGRETURN" * tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.x: hide array-bounds warning NFSv4.1: Keep a reference on lock states while checking NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state NFSv4: Don't call close if the open stateid has already been cleared NFSv4: Fix CLOSE races with OPEN NFSv4.1: Fix a regression in DELEGRETURN
2016-11-23Btrfs: fix emptiness check for dirtied extent buffers at check_leaf()Filipe Manana
We can not simply use the owner field from an extent buffer's header to get the id of the respective tree when the extent buffer is from a relocation tree. When we create the root for a relocation tree we leave (on purpose) the owner field with the same value as the subvolume's tree root (we do this at ctree.c:btrfs_copy_root()). So we must ignore extent buffers from relocation trees, which have the BTRFS_HEADER_FLAG_RELOC flag set, because otherwise we will always consider the extent buffer as not being the root of the tree (the root of original subvolume tree is always different from the root of the respective relocation tree). This lead to assertion failures when running with the integrity checker enabled (CONFIG_BTRFS_FS_CHECK_INTEGRITY=y) such as the following: [ 643.393409] BTRFS critical (device sdg): corrupt leaf, non-root leaf's nritems is 0: block=38506496, root=260, slot=0 [ 643.397609] BTRFS info (device sdg): leaf 38506496 total ptrs 0 free space 3995 [ 643.407075] assertion failed: 0, file: fs/btrfs/disk-io.c, line: 4078 [ 643.408425] ------------[ cut here ]------------ [ 643.409112] kernel BUG at fs/btrfs/ctree.h:3419! [ 643.409773] invalid opcode: 0000 [#1] PREEMPT SMP [ 643.410447] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq ppdev psmouse acpi_cpufreq parport_pc evdev parport tpm_tis tpm_tis_core pcspkr serio_raw i2c_piix4 sg tpm i2c_core button processor loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring scsi_mod virtio e1000 floppy [ 643.414356] CPU: 11 PID: 32726 Comm: btrfs Not tainted 4.8.0-rc8-btrfs-next-35+ #1 [ 643.414356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 643.414356] task: ffff880145e95b00 task.stack: ffff88014826c000 [ 643.414356] RIP: 0010:[<ffffffffa0352759>] [<ffffffffa0352759>] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP: 0018:ffff88014826fa28 EFLAGS: 00010292 [ 643.414356] RAX: 0000000000000039 RBX: ffff88014e2d7c38 RCX: 0000000000000001 [ 643.414356] RDX: ffff88023f4d2f58 RSI: ffffffff81806c63 RDI: 00000000ffffffff [ 643.414356] RBP: ffff88014826fa28 R08: 0000000000000001 R09: 0000000000000000 [ 643.414356] R10: ffff88014826f918 R11: ffffffff82f3c5ed R12: ffff880172910000 [ 643.414356] R13: ffff880233992230 R14: ffff8801a68a3310 R15: fffffffffffffff8 [ 643.414356] FS: 00007f9ca305e8c0(0000) GS:ffff88023f4c0000(0000) knlGS:0000000000000000 [ 643.414356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 643.414356] CR2: 00007f9ca3071000 CR3: 000000015d01b000 CR4: 00000000000006e0 [ 643.414356] Stack: [ 643.414356] ffff88014826fa50 ffffffffa02d655a 000000000000000a ffff88014e2d7c38 [ 643.414356] 0000000000000000 ffff88014826faa8 ffffffffa02b72f3 ffff88014826fab8 [ 643.414356] 00ffffffa03228e4 0000000000000000 0000000000000000 ffff8801bbd4e000 [ 643.414356] Call Trace: [ 643.414356] [<ffffffffa02d655a>] btrfs_mark_buffer_dirty+0xdf/0xe5 [btrfs] [ 643.414356] [<ffffffffa02b72f3>] btrfs_copy_root+0x18a/0x1d1 [btrfs] [ 643.414356] [<ffffffffa0322921>] create_reloc_root+0x72/0x1ba [btrfs] [ 643.414356] [<ffffffffa03267c2>] btrfs_init_reloc_root+0x7b/0xa7 [btrfs] [ 643.414356] [<ffffffffa02d9e44>] record_root_in_trans+0xdf/0xed [btrfs] [ 643.414356] [<ffffffffa02db04e>] btrfs_record_root_in_trans+0x50/0x6a [btrfs] [ 643.414356] [<ffffffffa030ad2b>] create_subvol+0x472/0x773 [btrfs] [ 643.414356] [<ffffffffa030b406>] btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [<ffffffffa030b406>] ? btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [<ffffffff810781ac>] ? preempt_count_add+0x65/0x68 [ 643.414356] [<ffffffff811a6e97>] ? __mnt_want_write+0x62/0x77 [ 643.414356] [<ffffffffa030b55d>] btrfs_ioctl_snap_create_transid+0xce/0x187 [btrfs] [ 643.414356] [<ffffffffa030b67d>] btrfs_ioctl_snap_create+0x67/0x81 [btrfs] [ 643.414356] [<ffffffffa030ecfd>] btrfs_ioctl+0x508/0x20dd [btrfs] [ 643.414356] [<ffffffff81293e39>] ? __this_cpu_preempt_check+0x13/0x15 [ 643.414356] [<ffffffff81155eca>] ? handle_mm_fault+0x976/0x9ab [ 643.414356] [<ffffffff81091300>] ? arch_local_irq_save+0x9/0xc [ 643.414356] [<ffffffff8119a2b0>] vfs_ioctl+0x18/0x34 [ 643.414356] [<ffffffff8119a8e8>] do_vfs_ioctl+0x581/0x600 [ 643.414356] [<ffffffff814b9552>] ? entry_SYSCALL_64_fastpath+0x5/0xa8 [ 643.414356] [<ffffffff81093fe9>] ? trace_hardirqs_on_caller+0x17b/0x197 [ 643.414356] [<ffffffff8119a9be>] SyS_ioctl+0x57/0x79 [ 643.414356] [<ffffffff814b9565>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 643.414356] [<ffffffff81091b08>] ? trace_hardirqs_off_caller+0x3f/0xaa [ 643.414356] Code: 89 83 88 00 00 00 31 c0 5b 41 5c 41 5d 5d c3 55 89 f1 48 c7 c2 98 bc 35 a0 48 89 fe 48 c7 c7 05 be 35 a0 48 89 e5 e8 13 46 dd e0 <0f> 0b 55 89 f1 48 c7 c2 9f d3 35 a0 48 89 fe 48 c7 c7 7a d5 35 [ 643.414356] RIP [<ffffffffa0352759>] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP <ffff88014826fa28> [ 643.468267] ---[ end trace 6a1b3fb1a9d7d6e3 ]--- This can be easily reproduced by running xfstests with the integrity checker enabled. Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item) Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2016-11-23Btrfs: fix BUG_ON in btrfs_mark_buffer_dirtyLiu Bo
This can only happen with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y. Commit 1ba98d0 ("Btrfs: detect corruption when non-root leaf has zero item") assumes that a leaf is its root when leaf->bytenr == btrfs_root_bytenr(root), however, we should not use btrfs_root_bytenr(root) since it's mainly got updated during committing transaction. So the check can fail when doing COW on this leaf while it is a root. This changes to use "if (leaf == btrfs_root_node(root))" instead, just like how we check whether leaf is a root in __btrfs_cow_block(). Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item) Cc: stable@vger.kernel.org # 4.8+ Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com>
2016-11-22NFSv4.x: hide array-bounds warningArnd Bergmann
A correct bugfix introduced a harmless warning that shows up with gcc-7: fs/nfs/callback.c: In function 'nfs_callback_up': fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds] What happens here is that the 'minorversion == 0' check tells the compiler that we assume minorversion can be something other than 0, but when CONFIG_NFS_V4_1 is disabled that would be invalid and result in an out-of-bounds access. The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this really can't happen, which makes the code slightly smaller and also avoids the warning. The bugfix that introduced the warning is marked for stable backports, we want this one backported to the same releases. Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net") Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-21NFSv4.1: Keep a reference on lock states while checkingBenjamin Coddington
While walking the list of lock_states, keep a reference on each nfs4_lock_state to be checked, otherwise the lock state could be removed while the check performs TEST_STATEID and possible FREE_STATEID. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-11-19Merge tag 'ext4_for_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A security fix (so a maliciously corrupted file system image won't panic the kernel) and some fixes for CONFIG_VMAP_STACK" * tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: sanity check the block and cluster size at mount time fscrypto: don't use on-stack buffer for key derivation fscrypto: don't use on-stack buffer for filename encryption
2016-11-19ext4: sanity check the block and cluster size at mount timeTheodore Ts'o
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-19fscrypto: don't use on-stack buffer for key derivationEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. get_crypt_info() was using a stack buffer to hold the output from the encryption operation used to derive the per-file key. Fix it by using a heap buffer. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-19fscrypto: don't use on-stack buffer for filename encryptionEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. For short filenames, fname_encrypt() was encrypting a stack buffer holding the padded filename. Fix it by encrypting the filename in-place in the output buffer, thereby making the temporary buffer unnecessary. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-19Btrfs: remove rb_node field from the delayed ref node structureFilipe Manana
After the last big change in the delayed references code that was needed for the last qgroups rework, the red black tree node field of struct btrfs_delayed_ref_node is no longer used, so just remove it, this helps us save some memory (since struct rb_node is 24 bytes on x86_64) for these structures. Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-11-19Btrfs: remove unused code when creating and merging reloc treesFilipe Manana
In commit 5bc7247ac47c (Btrfs: fix broken nocow after balance) we started abusing the rtransid and otransid fields of root items from relocation trees to fix some issues with nodatacow mode. However later in commit ba8b0289333a (Btrfs: do not reset last_snapshot after relocation) we dropped the code that made use of those fields but did not remove the code that sets those fields. So just remove them to avoid confusion. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com>
2016-11-19Btrfs: fix relocation incorrectly dropping data referencesFilipe Manana
During relocation of a data block group we create a relocation tree for each fs/subvol tree by making a snapshot of each tree using btrfs_copy_root() and the tree's commit root, and then setting the last snapshot field for the fs/subvol tree's root to the value of the current transaction id minus 1. However this can lead to relocation later dropping references that it did not create if we have qgroups enabled, leaving the filesystem in an inconsistent state that keeps aborting transactions. Lets consider the following example to explain the problem, which requires qgroups to be enabled. We are relocating data block group Y, we have a subvolume with id 258 that has a root at level 1, that subvolume is used to store directory entries for snapshots and we are currently at transaction 3404. When committing transaction 3404, we have a pending snapshot and therefore we call btrfs_run_delayed_items() at transaction.c:create_pending_snapshot() in order to create its dentry at subvolume 258. This results in COWing leaf A from root 258 in order to add the dentry. Note that leaf A also contains file extent items referring to extents from some other block group X (we are currently relocating block group Y). Later on, still at create_pending_snapshot() we call qgroup_account_snapshot(), which switches the commit root for root 258 when it calls switch_commit_roots(), so now the COWed version of leaf A, lets call it leaf A', is accessible from the commit root of tree 258. At the end of qgroup_account_snapshot(), we call record_root_in_trans() with 258 as its argument, which results in btrfs_init_reloc_root() being called, which in turn calls relocation.c:create_reloc_root() in order to create a relocation tree associated to root 258, which results in assigning the value of 3403 (which is the current transaction id minus 1 = 3404 - 1) to the last_snapshot field of root 258. When creating the relocation tree root at ctree.c:btrfs_copy_root() we add a shared reference for leaf A', corresponding to the relocation tree's root, when we call btrfs_inc_ref() against the COWed root (a copy of the commit root from tree 258), which is at level 1. So at this point leaf A' has 2 references, one normal reference corresponding to root 258 and one shared reference corresponding to the root of the relocation tree. Transaction 3404 finishes its commit and transaction 3405 is started by relocation when calling merge_reloc_root() for the relocation tree associated to root 258. In the meanwhile leaf A' is COWed again, in response to some filesystem operation, when we are still at transaction 3405. However when we COW leaf A', at ctree.c:update_ref_for_cow(), we call btrfs_block_can_be_shared() in order to figure out if other trees refer to the leaf and if any such trees exists, add a full back reference to leaf A' - but btrfs_block_can_be_shared() incorrectly returns false because the following condition is false: btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) which evaluates to 3404 <= 3403. So after leaf A' is COWed, it stays with only one reference, corresponding to the shared reference we created when we called btrfs_copy_root() to create the relocation tree's root and btrfs_inc_ref() ends up not being called for leaf A' nor we end up setting the flag BTRFS_BLOCK_FLAG_FULL_BACKREF in leaf A'. This results in not adding shared references for the extents from block group X that leaf A' refers to with its file extent items. Later, after merging the relocation root we do a call to to btrfs_drop_snapshot() in order to delete the relocation tree. This ends up calling do_walk_down() when path->slots[1] points to leaf A', which results in calling btrfs_lookup_extent_info() to get the number of references for leaf A', which is 1 at this time (only the shared reference exists) and this value is stored at wc->refs[0]. After this walk_up_proc() is called when wc->level is 0 and path->nodes[0] corresponds to leaf A'. Because the current level is 0 and wc->refs[0] is 1, it does call btrfs_dec_ref() against leaf A', which results in removing the single references that the extents from block group X have which are associated to root 258 - the expectation was to have each of these extents with 2 references - one reference for root 258 and one shared reference related to the root of the relocation tree, and so we would drop only the shared reference (because leaf A' was supposed to have the flag BTRFS_BLOCK_FLAG_FULL_BACKREF set). This leaves the filesystem in an inconsistent state as we now have file extent items in a subvolume tree that point to extents from block group X without references in the extent tree. So later on when we try to decrement the references for these extents, for example due to a file unlink operation, truncate operation or overwriting ranges of a file, we fail because the expected references do not exist in the extent tree. This leads to warnings and transaction aborts like the following: [ 588.965795] ------------[ cut here ]------------ [ 588.965815] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x432/0x5b0 [btrfs] [ 588.965816] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg [ 588.965831] CPU: 2 PID: 2479 Comm: kworker/u8:7 Not tainted 4.7.3-3-default-fdm+ #1 [ 588.965832] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 588.965844] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 588.965845] 0000000000000000 ffff8802263bfa28 ffffffff813af542 0000000000000000 [ 588.965847] 0000000000000000 ffff8802263bfa68 ffffffff81081e8b 0000065900000000 [ 588.965848] ffff8801db2af000 000000012bbe2000 0000000000000000 ffff880215703b48 [ 588.965849] Call Trace: [ 588.965852] [<ffffffff813af542>] dump_stack+0x63/0x81 [ 588.965854] [<ffffffff81081e8b>] __warn+0xcb/0xf0 [ 588.965855] [<ffffffff81081f7d>] warn_slowpath_null+0x1d/0x20 [ 588.965863] [<ffffffffa0175042>] lookup_inline_extent_backref+0x432/0x5b0 [btrfs] [ 588.965865] [<ffffffff81143220>] ? trace_clock_local+0x10/0x30 [ 588.965867] [<ffffffff8114c5df>] ? rb_reserve_next_event+0x6f/0x460 [ 588.965875] [<ffffffffa0175215>] insert_inline_extent_backref+0x55/0xd0 [btrfs] [ 588.965882] [<ffffffffa017531f>] __btrfs_inc_extent_ref.isra.55+0x8f/0x240 [btrfs] [ 588.965890] [<ffffffffa017acea>] __btrfs_run_delayed_refs+0x74a/0x1260 [btrfs] [ 588.965892] [<ffffffff810cb046>] ? cpuacct_charge+0x86/0xa0 [ 588.965900] [<ffffffffa017e74f>] btrfs_run_delayed_refs+0x9f/0x2c0 [btrfs] [ 588.965908] [<ffffffffa017ea04>] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 588.965918] [<ffffffffa01c799a>] btrfs_scrubparity_helper+0xca/0x350 [btrfs] [ 588.965928] [<ffffffffa01c7c5e>] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [ 588.965930] [<ffffffff8109b323>] process_one_work+0x1f3/0x4e0 [ 588.965931] [<ffffffff8109b658>] worker_thread+0x48/0x4e0 [ 588.965932] [<ffffffff8109b610>] ? process_one_work+0x4e0/0x4e0 [ 588.965934] [<ffffffff810a1659>] kthread+0xc9/0xe0 [ 588.965936] [<ffffffff816f2f1f>] ret_from_fork+0x1f/0x40 [ 588.965937] [<ffffffff810a1590>] ? kthread_worker_fn+0x170/0x170 [ 588.965938] ---[ end trace 34e5232c933a1749 ]--- [ 588.966187] ------------[ cut here ]------------ [ 588.966196] WARNING: CPU: 2 PID: 2479 at fs/btrfs/extent-tree.c:2966 btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [ 588.966196] BTRFS: Transaction aborted (error -5) [ 588.966197] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs libcrc32c ppdev acpi_cpufreq button tpm_tis e1000 i2c_piix4 pcspkr parport_pc parport tpm qemu_fw_cfg joydev btrfs xor raid6_pq sr_mod cdrom ata_generic virtio_scsi ata_piix virtio_pci bochs_drm virtio_ring drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio ttm serio_raw drm floppy sg [ 588.966206] CPU: 2 PID: 2479 Comm: kworker/u8:7 Tainted: G W 4.7.3-3-default-fdm+ #1 [ 588.966207] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 588.966217] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 588.966217] 0000000000000000 ffff8802263bfc98 ffffffff813af542 ffff8802263bfce8 [ 588.966219] 0000000000000000 ffff8802263bfcd8 ffffffff81081e8b 00000b96345ee000 [ 588.966220] ffffffffa021ae1c ffff880215703b48 00000000000005fe ffff8802345ee000 [ 588.966221] Call Trace: [ 588.966223] [<ffffffff813af542>] dump_stack+0x63/0x81 [ 588.966224] [<ffffffff81081e8b>] __warn+0xcb/0xf0 [ 588.966225] [<ffffffff81081eff>] warn_slowpath_fmt+0x4f/0x60 [ 588.966233] [<ffffffffa017e93c>] btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [ 588.966241] [<ffffffffa017ea04>] delayed_ref_async_start+0x94/0xb0 [btrfs] [ 588.966250] [<ffffffffa01c799a>] btrfs_scrubparity_helper+0xca/0x350 [btrfs] [ 588.966259] [<ffffffffa01c7c5e>] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [ 588.966260] [<ffffffff8109b323>] process_one_work+0x1f3/0x4e0 [ 588.966261] [<ffffffff8109b658>] worker_thread+0x48/0x4e0 [ 588.966263] [<ffffffff8109b610>] ? process_one_work+0x4e0/0x4e0 [ 588.966264] [<ffffffff810a1659>] kthread+0xc9/0xe0 [ 588.966265] [<ffffffff816f2f1f>] ret_from_fork+0x1f/0x40 [ 588.966267] [<ffffffff810a1590>] ? kthread_worker_fn+0x170/0x170 [ 588.966268] ---[ end trace 34e5232c933a174a ]--- [ 588.966269] BTRFS: error (device sda2) in btrfs_run_delayed_refs:2966: errno=-5 IO failure [ 588.966270] BTRFS info (device sda2): forced readonly This was happening often on openSUSE and SLE systems using btrfs as the root filesystem (with its default layout where multiple subvolumes are used) where balance happens in the background triggered by a cron job and snapshots are automatically created before/after package installations, upgrades and removals. The issue could be triggered simply by running the following loop on the first system boot post installation: while true; do zypper -n in nfs-kernel-server zypper -n rm nfs-kernel-server done (If we were fast enough and made that loop before the cron job triggered a balance operation and the balance finished) So fix by setting the last_snapshot field of the root to the value of the generation of its commit root. Like this btrfs_block_can_be_shared() behaves correctly for the case where the relocation root is created during a transaction commit and for the case where it's created before a transaction commit. Fixes: 6426c7ad697d (btrfs: qgroup: Fix qgroup accounting when creating snapshot) Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com>
2016-11-18NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_stateBenjamin Coddington
Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() . Instead of marking state recovery as failed, mark the state for recovery again. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>