summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-02-25Merge branch 'x86/debug' into core/objtool, to pick up frame pointer fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes - xattr one from this cycle, the rest - stable fodder" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/pnode.c: treat zero mnt_group_id-s as unequal affs_do_readpage_ofs(): just use kmap_atomic() around memcpy() xattr handlers: plug a lock leak in simple_xattr_list fs: allow no_seek_end_llseek to actually seek
2016-02-23Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Stable bugfixes: - Fix nfs_size_to_loff_t - NFSv4: Fix a dentry leak on alias use Other bugfixes: - Don't schedule a layoutreturn if the layout segment can be freed immediately. - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode - rpcrdma_bc_receive_call() should init rq_private_buf.len - fix stateid handling for the NFS v4.2 operations - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page - fix panic in gss_pipe_downcall() in fips mode - Fix a race between layoutget and pnfs_destroy_layout - Fix a race between layoutget and bulk recalls" * tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout auth_gss: fix panic in gss_pipe_downcall() in fips mode pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page nfs4: fix stateid handling for the NFS v4.2 operations NFSv4: Fix a dentry leak on alias use xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode pNFS: Fix pnfs_mark_matching_lsegs_return() nfs: fix nfs_size_to_loff_t
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and bulk recallsTrond Myklebust
Replace another case where the layout 'plh_block_lgets' can trigger infinite loops in send_layoutget(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-22NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layoutTrond Myklebust
If the server reboots while there is a layoutget outstanding, then the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN error, which causes an infinite loop in send_layoutget(). The reason why we never break out of the loop is that the layout 'plh_block_lgets' field is never cleared. Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which can be reset after a new layoutget. Fixes: ab7d763e477c5 ("pNFS: Ensure nfs4_layoutget_prepare returns...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This is unusually large, partly due to the EFI fixes that prevent accidental deletion of EFI variables through efivarfs that may brick machines. These fixes are somewhat involved to maintain compatibility with existing install methods and other usage modes, while trying to turn off the 'rm -rf' bricking vector. Other fixes are for large page ioremap()s and for non-temporal user-memcpy()s" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix vmalloc_fault() to handle large pages properly hpet: Drop stale URLs x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable lib/ucs2_string: Correct ucs2 -> utf8 conversion efi: Add pstore variables to the deletion whitelist efi: Make efivarfs entries immutable by default efi: Make our variable validation list include the guid efi: Do variable name validation tests in utf8 efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version lib/ucs2_string: Add ucs2 -> utf8 helper functions
2016-02-20fs/pnode.c: treat zero mnt_group_id-s as unequalMaxim Patlasov
propagate_one(m) calculates "type" argument for copy_tree() like this: > if (m->mnt_group_id == last_dest->mnt_group_id) { > type = CL_MAKE_SHARED; > } else { > type = CL_SLAVE; > if (IS_MNT_SHARED(m)) > type |= CL_MAKE_SHARED; > } The "type" argument then governs clone_mnt() behavior with respect to flags and mnt_master of new mount. When we iterate through a slave group, it is possible that both current "m" and "last_dest" are not shared (although, both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison above erroneously makes new mount shared and sets its mnt_master to last_source->mnt_master. The patch fixes the problem by handling zero mnt_group_id-s as though they are unequal. The similar problem exists in the implementation of "else" clause above when we have to ascend upward in the master/slave tree by calling: > last_source = last_source->mnt_master; > last_dest = last_source->mnt_parent; proper number of times. The last step is governed by "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if both are zero. The patch fixes this case in the same way as the former one. [AV: don't open-code an obvious helper...] Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20affs_do_readpage_ofs(): just use kmap_atomic() around memcpy()Al Viro
It forgets kunmap() on a failure exit, but there's really no point keeping the page kmapped at all - after all, what we are doing is a bunch of memcpy() into the parts of page, so kmap_atomic()/kunmap_atomic() just around those memcpy() is enough. Spotted-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20xattr handlers: plug a lock leak in simple_xattr_listMateusz Guzik
The code could leak xattrs->lock on error. Problem introduced with 786534b92f3ce68f4 "tmpfs: listxattr should include POSIX ACL xattrs". Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-20fs: allow no_seek_end_llseek to actually seekWouter van Kesteren
The user-visible impact of the issue is for example that without this patch sensors-detect breaks when trying to seek in /dev/cpu/0/cpuid. '~0ULL' is a 'unsigned long long' that when converted to a loff_t, which is signed, gets turned into -1. later in vfs_setpos we have 'if (offset > maxsize)', which makes it always return EINVAL. Fixes: b25472f9b961 ("new helpers: no_seek_end_llseek{,_size}()") Signed-off-by: Wouter van Kesteren <woutershep@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-19Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v4.5" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix crashes in dioread_nolock mode ext4: fix bh->b_state corruption ext4: fix memleak in ext4_readdir() ext4: remove unused parameter "newblock" in convert_initialized_extent() ext4: don't read blocks from disk after extents being swapped ext4: fix potential integer overflow ext4: add a line break for proc mb_groups display ext4: ioctl: fix erroneous return value ext4: fix scheduling in atomic on group checksum failure ext4 crypto: move context consistency check to ext4_file_open() ext4 crypto: revalidate dentry after adding or removing the key
2016-02-19Merge branch 'for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "My for-linus-4.5 branch has a btrfs DIO error passing fix. I know how much you love DIO, so I'm going to suggest against reading it. We'll follow up with a patch to drop the error arg from dio_end_io in the next merge window." * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix direct IO requests not reporting IO error to user space
2016-02-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: slab: free kmem_cache_node after destroy sysfs file ipc/shm: handle removed segments gracefully in shm_mmap() MAINTAINERS: update Kselftest Framework mailing list devm_memremap_release(): fix memremap'd addr handling mm/hugetlb.c: fix incorrect proc nr_hugepages value mm, x86: fix pte_page() crash in gup_pte_range() fsnotify: turn fsnotify reaper thread into a workqueue job Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" mm: fix regression in remap_file_pages() emulation thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19ext4: fix crashes in dioread_nolock modeJan Kara
Competing overwrite DIO in dioread_nolock mode will just overwrite pointer to io_end in the inode. This may result in data corruption or extent conversion happening from IO completion interrupt because we don't properly set buffer_defer_completion() when unlocked DIO races with locked DIO to unwritten extent. Since unlocked DIO doesn't need io_end for anything, just avoid allocating it and corrupting pointer from inode for locked DIO. A cleaner fix would be to avoid these games with io_end pointer from the inode but that requires more intrusive changes so we leave that for later. Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-19ext4: fix bh->b_state corruptionJan Kara
ext4 can update bh->b_state non-atomically in _ext4_get_block() and ext4_da_get_block_prep(). Usually this is fine since bh is just a temporary storage for mapping information on stack but in some cases it can be fully living bh attached to a page. In such case non-atomic update of bh->b_state can race with an atomic update which then gets lost. Usually when we are mapping bh and thus updating bh->b_state non-atomically, nobody else touches the bh and so things work out fine but there is one case to especially worry about: ext4_finish_bio() uses BH_Uptodate_Lock on the first bh in the page to synchronize handling of PageWriteback state. So when blocksize < pagesize, we can be atomically modifying bh->b_state of a buffer that actually isn't under IO and thus can race e.g. with delalloc trying to map that buffer. The result is that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock. Fix the problem by always updating bh->b_state bits atomically. CC: stable@vger.kernel.org Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-18fsnotify: turn fsnotify reaper thread into a workqueue jobJeff Layton
We don't require a dedicated thread for fsnotify cleanup. Switch it over to a workqueue job instead that runs on the system_unbound_wq. In the interest of not thrashing the queued job too often when there are a lot of marks being removed, we delay the reaper job slightly when queueing it, to allow several to gather on the list. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"Jeff Layton
This reverts commit c510eff6beba ("fsnotify: destroy marks with call_srcu instead of dedicated thread"). Eryu reported that he was seeing some OOM kills kick in when running a testcase that adds and removes inotify marks on a file in a tight loop. The above commit changed the code to use call_srcu to clean up the marks. While that does (in principle) work, the srcu callback job is limited to cleaning up entries in small batches and only once per jiffy. It's easily possible to overwhelm that machinery with too many call_srcu callbacks, and Eryu's reproduer did just that. There's also another potential problem with using call_srcu here. While you can obviously sleep while holding the srcu_read_lock, the callbacks run under local_bh_disable, so you can't sleep there. It's possible when putting the last reference to the fsnotify_mark that we'll end up putting a chain of references including the fsnotify_group, uid, and associated keys. While I don't see any obvious ways that that could occurs, it's probably still best to avoid using call_srcu here after all. This patch reverts the above patch. A later patch will take a different approach to eliminated the dedicated thread here. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Cc: Jan Kara <jack@suse.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18Merge branch 'x86/urgent' into x86/asm, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A collection of fixes from the past few weeks that should go into 4.5. This contains: - Overflow fix for sysfs discard show function from Alan. - A stacking limit init fix for max_dev_sectors, so we don't end up artificially capping some use cases. From Keith. - Have blk-mq proper end unstarted requests on a dying queue, instead of pushing that to the driver. From Keith. - NVMe: - Update to Kconfig description for NVME_SCSI, since it was vague and having it on is important for some SUSE distros. From Christoph. - Set of fixes from Keith, around surprise removal. Also kills the no-merge flag, so it supports merging. - Set of fixes for lightnvm from Matias, Javier, and Wenwei. - Fix null_blk oops when asked for lightnvm, but not available. From Matias. - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails if interrupted by a signal. - Two floppy fixes from Jiri, fixing signal handling and blocking open. - A use-after-free fix for O_DIRECT, from Mike Krinkin. - A block module ref count fix from Roman Pen. - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini. - Smaller reallo fix for xen-blkfront from Bob Liu. - Removal of an unused struct member in the deadline IO scheduler, from Tahsin. - Also from Tahsin, properly initialize inode struct members associated with cgroup writeback, if enabled. - From Tejun, ensure that we keep the superblock pinned during cgroup writeback" * 'for-linus' of git://git.kernel.dk/linux-block: (25 commits) blk: fix overflow in queue_discard_max_hw_show writeback: initialize inode members that track writeback history writeback: keep superblock pinned during cgroup writeback association switches bio: return EINTR if copying to user space got interrupted NVMe: Rate limit nvme IO warnings NVMe: Poll device while still active during remove NVMe: Requeue requests on suspended queues NVMe: Allow request merges NVMe: Fix io incapable return values blk-mq: End unstarted requests on dying queue block: Initialize max_dev_sectors to 0 null_blk: oops when initializing without lightnvm block: fix module reference leak on put_disk() call for cgroups throttle nvme: fix Kconfig description for BLK_DEV_NVME_SCSI kernel/fs: fix I/O wait not accounted for RW O_DSYNC floppy: refactor open() flags handling lightnvm: allow to force mm initialization lightnvm: check overflow and correct mlc pairs lightnvm: fix request intersection locking in rrpc lightnvm: warn if irqs are disabled in lock laddr ...
2016-02-17pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_pageKinglong Mee
unreferenced object 0xffffc90000abf000 (size 16900): comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280 [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50 [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver] [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver] [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4] [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4] [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4] [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0 [<ffffffff81228f1d>] do_fsync+0x3d/0x70 [<ffffffff812291d0>] SyS_fsync+0x10/0x20 [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff v2, add missing include header Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17nfs4: fix stateid handling for the NFS v4.2 operationsChristoph Hellwig
The newly added NFS v4.2 operations (ALLOCATE, DEALLOCATE, SEEK and CLONE) use a helper called nfs42_set_rw_stateid to select a stateid that is sent to the server. But they don't set the inode and state fields in the nfs4_exception structure, and this don't partake in the stateid recovery protocol. Because of this they will simply return errors insted of trying to recover a stateid when the server return a BAD_STATEID error. Additionally CLONE has the problem that it operates on two files and thus two stateids, and thus needs to call the exception handler twice to recover stateids. While we're at it stop grabbing an addititional reference to the open context in all these operations - having the file open guarantees that the open context won't go away. All this can be produces with the generic/168 and generic/170 tests in xfstests which stress the CLONE stateid handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-17NFSv4: Fix a dentry leak on alias useBenjamin Coddington
In the case where d_add_unique() finds an appropriate alias to use it will have already incremented the reference count. An additional dget() to swap the open context's dentry is unnecessary and will leak a reference. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 275bb307865a3 ("NFSv4: Move dentry instantiation into the NFSv4-...") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-16writeback: initialize inode members that track writeback historyTahsin Erdogan
inode struct members that track cgroup writeback information should be reinitialized when inode gets allocated from kmem_cache. Otherwise, their values remain and get used by the new inode. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Tejun Heo <tj@kernel.org> Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-16Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "A small set of cifs fixes. I am still reviewing some more, recently submitted SMB3 fixes, but these three are small and safe and ready now" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix erroneous return value cifs: fix potential overflow in cifs_compose_mount_options cifs: remove redundant check for null string pointer
2016-02-16writeback: keep superblock pinned during cgroup writeback association switchesTejun Heo
If cgroup writeback is in use, an inode is associated with a cgroup for writeback. If the inode's main dirtier changes to another cgroup, the association gets updated asynchronously. Nothing was pinning the superblock while such switches are in progress and superblock could go away while async switching is pending or in progress leading to crashes like the following. kernel BUG at fs/jbd2/transaction.c:319! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51 Hardware name: Google Google, BIOS Google 01/01/2011 Workqueue: events inode_switch_wbs_work_fn task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000 RIP: 0010:[<ffffffff803e6922>] [<ffffffff803e6922>] start_this_handle+0x382/0x3e0 RSP: 0018:ffff880209267c30 EFLAGS: 00010202 ... Call Trace: [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190 [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70 [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0 [<ffffffff8035338b>] evict+0xbb/0x190 [<ffffffff80354190>] iput+0x130/0x190 [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0 [<ffffffff80279819>] process_one_work+0x129/0x300 [<ffffffff80279b16>] worker_thread+0x126/0x480 [<ffffffff8027ed14>] kthread+0xc4/0xe0 [<ffffffff809771df>] ret_from_fork+0x3f/0x70 Fix it by bumping s_active while cgroup association switching is in flight. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Tahsin Erdogan <tahsin@google.com> Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching") Cc: stable@vger.kernel.org #v4.5+ Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-16Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: * Prevent accidental deletion of EFI variables through efivarfs that may brick machines. We use a whitelist of known-safe variables to allow things like installing distributions to work out of the box, and instead restrict vendor-specific variable deletion by making non-whitelist variables immutable (Peter Jones) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16ext4: fix memleak in ext4_readdir()Kirill Tkhai
When ext4_bread() fails, fname_crypto_str remains allocated after return. Fix that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>
2016-02-16Btrfs: fix direct IO requests not reporting IO error to user spaceFilipe Manana
If a bio for a direct IO request fails, we were not setting the error in the parent bio (the main DIO bio), making us not return the error to user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO() return the number of bytes issued for IO and not the error a bio created and submitted by btrfs_submit_direct() got from the block layer. This essentially happens because when we call: dio_end_io(dio_bio, bio->bi_error); It does not set dio_bio->bi_error to the value of the second argument. So just add this missing assignment in endio callbacks, just as we do in the error path at btrfs_submit_direct() when we fail to clone the dio bio or allocate its private object. This follows the convention of what is done with other similar APIs such as bio_endio() where the caller is responsible for setting the bi_error field in the bio it passes as an argument to bio_endio(). This was detected by the new generic test cases in xfstests: 271, 272, 276 and 278. Which essentially setup a dm error target, then load the error table, do a direct IO write and unload the error table. They expect the write to fail with -EIO, which was not getting reported when testing against btrfs. Cc: stable@vger.kernel.org # 4.3+ Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-02-15pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomodeTrond Myklebust
When setting the layout return mode, we must always also set the NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn. Otherwise pnfs_error_mark_layout_for_return() could set the mode, but fail to send the layoutreturn because another is already in flight. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-15pNFS: Fix pnfs_mark_matching_lsegs_return()Trond Myklebust
We don't need to schedule a layoutreturn if the layout segment can be freed immediately. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-14Merge tag 'tty-4.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 4.5-rc4 that resolve some reported issues. One of them got reverted as it wasn't correct based on testing, and all have been in linux-next for a while" * tag 'tty-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "8250: uniphier: allow modular build with 8250 console" pty: make sure super_block is still valid in final /dev/tty close pty: fix possible use after free of tty->driver_data tty: Add support for PCIe WCH382 2S multi-IO card serial/omap: mark wait_for_xmitr as __maybe_unused serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485) 8250: uniphier: allow modular build with 8250 console tty: Drop krefs for interrupted tty lock
2016-02-12Merge branch 'for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has a few fixes from Filipe, along with a readdir fix from Dave that we've been testing for some time" * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: properly set the termination value of ctx->pos in readdir Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl Btrfs: remove no longer used function extent_read_full_page_nolock() Btrfs: fix page reading in extent_same ioctl leading to csum errors Btrfs: fix invalid page accesses in extent_same (dedup) ioctl
2016-02-12Merge tag 'xfs-fixes-for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fix from Dve Chinner: "This contains a fix for an endian conversion issue in new CRC validation in log recovery that was discovered on a ppc64 platform" * tag 'xfs-fixes-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: fix endianness error when checking log block crc on big endian platforms
2016-02-12ext4: remove unused parameter "newblock" in convert_initialized_extent()Eryu Guan
The "newblock" parameter is not used in convert_initialized_extent(), remove it. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12ext4: don't read blocks from disk after extents being swappedEryu Guan
I notice ext4/307 fails occasionally on ppc64 host, reporting md5 checksum mismatch after moving data from original file to donor file. The reason is that move_extent_per_page() calls __block_write_begin() and block_commit_write() to write saved data from original inode blocks to donor inode blocks, but __block_write_begin() not only maps buffer heads but also reads block content from disk if the size is not block size aligned. At this time the physical block number in mapped buffer head is pointing to the donor file not the original file, and that results in reading wrong data to page, which get written to disk in following block_commit_write call. This also can be reproduced by the following script on 1k block size ext4 on x86_64 host: mnt=/mnt/ext4 donorfile=$mnt/donor testfile=$mnt/testfile e4compact=~/xfstests/src/e4compact rm -f $donorfile $testfile # reserve space for donor file, written by 0xaa and sync to disk to # avoid EBUSY on EXT4_IOC_MOVE_EXT xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile # create test file written by 0xbb xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile # compute initial md5sum md5sum $testfile | tee md5sum.txt # drop cache, force e4compact to read data from disk echo 3 > /proc/sys/vm/drop_caches # test defrag echo "$testfile" | $e4compact -i -v -f $donorfile # check md5sum md5sum -c md5sum.txt Fix it by creating & mapping buffer heads only but not reading blocks from disk, because all the data in page is guaranteed to be up-to-date in mext_page_mkuptodate(). Cc: stable@vger.kernel.org Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12ext4: fix potential integer overflowInsu Yun
Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data), integer overflow could be happened. Therefore, need to fix integer overflow sanitization. Cc: stable@vger.kernel.org Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-12ext4: add a line break for proc mb_groups displayHuaitong Han
This patch adds a line break for proc mb_groups display. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-02-11ext4: ioctl: fix erroneous return valueAnton Protopopov
The ext4_ioctl_setflags() function which is used in the ioctls EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value EPERM instead of -EPERM in case of error. This bug was introduced by a recent commit 9b7365fc. The following program can be used to illustrate the wrong behavior: #include <sys/types.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #define FS_IOC_GETFLAGS _IOR('f', 1, long) #define FS_IOC_SETFLAGS _IOW('f', 2, long) #define FS_IMMUTABLE_FL 0x00000010 int main(void) { int fd; long flags; fd = open("file", O_RDWR|O_CREAT, 0600); if (fd < 0) err(1, "open"); if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0) err(1, "ioctl: FS_IOC_GETFLAGS"); flags |= FS_IMMUTABLE_FL; if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0) err(1, "ioctl: FS_IOC_SETFLAGS"); warnx("ioctl returned no error"); return 0; } Running it gives the following result: $ strace -e ioctl ./test ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0 ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1 test: ioctl returned no error +++ exited with 0 +++ Running the program on a kernel with the bug fixed gives the proper result: $ strace -e ioctl ./test ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0 ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted) test: ioctl: FS_IOC_SETFLAGS: Operation not permitted +++ exited with 1 +++ Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-11ext4: fix scheduling in atomic on group checksum failureJan Kara
When block group checksum is wrong, we call ext4_error() while holding group spinlock from ext4_init_block_bitmap() or ext4_init_inode_bitmap() which results in scheduling while in atomic. Fix the issue by calling ext4_error() later after dropping the spinlock. CC: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-02-11btrfs: properly set the termination value of ctx->pos in readdirDavid Sterba
The value of ctx->pos in the last readdir call is supposed to be set to INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a larger value, then it's LLONG_MAX. There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++" overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a 64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before the increment. We can get to that situation like that: * emit all regular readdir entries * still in the same call to readdir, bump the last pos to INT_MAX * next call to readdir will not emit any entries, but will reach the bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX Normally this is not a problem, but if we call readdir again, we'll find 'pos' set to LLONG_MAX and the unconditional increment will overflow. The report from Victor at (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging print shows that pattern: Overflow: e Overflow: 7fffffff Overflow: 7fffffffffffffff PAX: size overflow detected in function btrfs_real_readdir fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0; context: dir_context; CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015 ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48 ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78 ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8 Call Trace: [<ffffffff81742f0f>] dump_stack+0x4c/0x7f [<ffffffff811cb706>] report_size_overflow+0x36/0x40 [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0 [<ffffffff811dafc8>] iterate_dir+0xa8/0x150 [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70 [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0 Overflow: 1a [<ffffffff811db070>] ? iterate_dir+0x150/0x150 [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83 The jump from 7fffffff to 7fffffffffffffff happens when new dir entries are not yet synced and are processed from the delayed list. Then the code could go to the bump section again even though it might not emit any new dir entries from the delayed list. The fix avoids entering the "bump" section again once we've finished emitting the entries, both for synced and delayed entries. References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor <services@swwu.com> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-02-10cifs: fix erroneous return valueAnton Protopopov
The setup_ntlmv2_rsp() function may return positive value ENOMEM instead of -ENOMEM in case of kmalloc failure. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10cifs: fix potential overflow in cifs_compose_mount_optionsInsu Yun
In worst case, "ip=" + sb_mountdata + ipv6 can be copied into mountdata. Therefore, for safe, it is better to add more size when allocating memory. Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10cifs: remove redundant check for null string pointerColin Ian King
server_RFC1001_name is declared as a RFC1001_NAME_LEN_WITH_NULL sized char array in struct TCP_Server_Info so the null pointer check on server_RFC1001_name is redundant and can be removed. Detected with smatch: fs/cifs/connect.c:2982 ip_rfc1001_connect() warn: this array is probably non-NULL. 'server->server_RFC1001_name' Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-02-10efi: Make efivarfs entries immutable by defaultPeter Jones
"rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to *accidentally* brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-10efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad versionPeter Jones
Translate EFI's UCS-2 variable names to UTF-8 instead of just assuming all variable names fit in ASCII. Signed-off-by: Peter Jones <pjones@redhat.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-08ext4 crypto: move context consistency check to ext4_file_open()Theodore Ts'o
In the case where the per-file key for the directory is cached, but root does not have access to the key needed to derive the per-file key for the files in the directory, we allow the lookup to succeed, so that lstat(2) and unlink(2) can suceed. However, if a program tries to open the file, it will get an ENOKEY error. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-07ext4 crypto: revalidate dentry after adding or removing the keyTheodore Ts'o
Add a validation check for dentries for encrypted directory to make sure we're not caching stale data after a key has been added or removed. Also check to make sure that status of the encryption key is updated when readdir(2) is executed. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-08xfs: fix endianness error when checking log block crc on big endian platformsDarrick J. Wong
Since the checksum function and the field are both __le32, don't perform endian conversion when comparing the two. This fixes mount failures on ppc64. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-06pty: make sure super_block is still valid in final /dev/tty closeHerton R. Krzesinski
Considering current pty code and multiple devpts instances, it's possible to umount a devpts file system while a program still has /dev/tty opened pointing to a previosuly closed pty pair in that instance. In the case all ptmx and pts/N files are closed, umount can be done. If the program closes /dev/tty after umount is done, devpts_kill_index will use now an invalid super_block, which was already destroyed in the umount operation after running ->kill_sb. This is another "use after free" type of issue, but now related to the allocated super_block instance. To avoid the problem (warning at ida_remove and potential crashes) for this specific case, I added two functions in devpts which grabs additional references to the super_block, which pty code now uses so it makes sure the super block structure is still valid until pty shutdown is done. I also moved the additional inode references to the same functions, which also covered similar case with inode being freed before /dev/tty final close/shutdown. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Cc: stable@vger.kernel.org # 2.6.29+ Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT radix-tree: fix oops after radix_tree_iter_retry MAINTAINERS: trim the file triggers for ABI/API dax: dirty inode only if required thp: make deferred_split_scan() work again mm: replace vma_lock_anon_vma with anon_vma_lock_read/write ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup um: asm/page.h: remove the pte_high member from struct pte_t mm, hugetlb: don't require CMA for runtime gigantic pages mm/hugetlb: fix gigantic page initialization/allocation mm: downgrade VM_BUG in isolate_lru_page() to warning mempolicy: do not try to queue pages from !vma_migratable() mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress vmstat: make vmstat_update deferrable mm, vmstat: make quiet_vmstat lighter mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT memblock: don't mark memblock_phys_mem_size() as __init dump_stack: avoid potential deadlocks mm: validate_mm browse_rb SMP race condition m32r: fix build failure due to SMP and MMU ...