Age | Commit message (Collapse) | Author |
|
This branch simplifies and clarifies the dcache lookup, and allows us to
do certain nice optimizations when comparing dentries. It also cleans
up the interface to __d_lookup_rcu(), especially around passing the
inode information around.
* dentry-cleanups:
vfs: make it possible to access the dentry hash/len as one 64-bit entry
vfs: move dentry name length comparison from dentry_cmp() into callers
vfs: do the careful dentry name access for all dentry_cmp cases
vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu
vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
|
|
This teaches vfs_fstat() to use the appropriate f[get|put]_light
functions, allowing it to avoid some unnecessary locking for the common
case.
More noticeably, it also cleans up and simplifies the "getname_flags()"
function, which now relies on the architecture strncpy_from_user() doing
all the user access checks properly, instead of hacking around the fact
that on x86 it didn't use to do it right (see commit 92ae03f2ef99: "x86:
merge 32/64-bit versions of 'strncpy_from_user()' and speed it up").
* vfs-cleanups:
VFS: make vfs_fstat() use f[get|put]_light()
VFS: clean up and simplify getname_flags()
x86: make word-at-a-time strncpy_from_user clear bytes at the end
|
|
This makes cp_new_stat() a bit more readable, and avoids having to
memset() the whole structure just to fill in a couple of padding fields.
This is another result of me looking at code generation of functions
that show up high on certain kernel profiles, and just going "Oh, let's
just clean that up".
Architectures that don't supply the #define to fill just the padding
fields will still fall back to memset().
* stat-cleanups:
vfs: don't force a big memset of stat data just to clear padding fields
vfs: de-crapify "cp_new_stat()" function
|
|
This series sanitizes the interface to unmap_vma(). The crazy interface
annoyed me no end when I was looking at unmap_single_vma(), which we can
spend quite a lot of time in (especially with loads that have a lot of
small fork/exec's: shell scripts etc).
Moving the nr_accounted calculations to where they belong at least
clarifies things a little. I hope to come back to look at the
performance of this later, but if/when I get back to it I at least don't
have to see the crazy interfaces any more.
* vm-cleanups:
vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces
vm: simplify unmap_vmas() calling convention
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6
Pull PA-RISC fixes from James Bottomley:
"This is a set of three bug fixes that gets parisc running again on
systems with PA1.1 processors.
Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug
that only affects PA7300LC processors. We also have another pending
fix to do with the sectional arrangement of vmlinux.lds, but there's a
query on it during testing on one particular system type, so I'll hold
off sending it in for now."
* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
[PARISC] fix panic on prefetch(NULL) on PA7300LC
[PARISC] fix crash in flush_icache_page_asm on PA1.1
[PARISC] fix PA1.1 oops on boot
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 linker bug workarounds from Peter Anvin.
GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
turns certain relocation entries absolute. Section-relative symbols
that are part of otherwise empty sections are silently changed them to
absolute. We rely on section-relative symbols staying section-relative,
and actually have several sections in the linker script solely for this
purpose.
See for example
http://sourceware.org/bugzilla/show_bug.cgi?id=14052
We could just black-list the buggy linker, but it appears that it got
shipped in at least F17, and possibly other distros too, so it's sadly
not some rare unusual case.
This backports the workaround from the x86/trampoline branch, and as
Peter says: "This is not a minimal fix, not at all, but it is a tested
code base."
* 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, relocs: When printing an error, say relative or absolute
x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
x86, realmode: 16-bit real-mode code support for relocs tool
(*) That's a manly release numbering system. Stupid, sure. But manly.
|
|
Pull block layer fixes from Jens Axboe:
"A few small, but important fixes. Most of them are marked for stable
as well
- Fix failure to release a semaphore on error path in mtip32xx.
- Fix crashable condition in bio_get_nr_vecs().
- Don't mark end-of-disk buffers as mapped, limit it to i_size.
- Fix for build problem with CONFIG_BLOCK=n on arm at least.
- Fix for a buffer overlow on UUID partition printing.
- Trivial removal of unused variables in dac960."
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix buffer overflow when printing partition UUIDs
Fix blkdev.h build errors when BLOCK=n
bio allocation failure due to bio_get_nr_vecs()
block: don't mark buffers beyond end of disk as mapped
mtip32xx: release the semaphore on an error path
dac960: Remove unused variables from DAC960_CreateProcEntries()
|
|
Pull one more networking bug-fix from David Miller:
"One last straggler.
Eric Dumazet's pktgen unload oops fix was not entirely complete, but
all the cases should be handled properly now.... fingers crossed."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
pktgen: fix module unload for good
|
|
Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
a flurry of hundreds of warnings at kernel/res_counter.c:96, where
res_counter_uncharge_locked() does WARN_ON(counter->usage < val).
The first trace of each flurry implicates __mem_cgroup_cancel_charge()
of mc.precharge, and an audit of mc.precharge handling points to
mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e806
("memcg: avoid THP split in task migration").
Checking !mc.precharge is good everywhere else, when a single page is to
be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
follow, is liable to result in underflow (a lot can change since the
precharge was estimated).
Simply check against HPAGE_PMD_NR: there's probably a better
alternative, trying precharge for more, splitting if unsuccessful; but
this one-liner is safer for now - no kernel/res_counter.c:96 warnings
seen in 26 hours.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol. This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
|
|
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols. Let the relocs program know that those should be treated as
relative symbols.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
|
|
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.
In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.
16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.
The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture. be compiled before building the arch/x86 tree.
[ hpa: accelerating this because it detects invalid absolute
relocations, a serious bug in binutils 2.22.52.0.x which currently
produces bad kernels. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull a dm fix from Alasdair G Kergon:
"A fix to the thin provisioning userspace interface."
* tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
dm thin: fix table output when pool target disables discard passdown internally
|
|
When the thin pool target clears the discard_passdown parameter
internally, it incorrectly changes the table line reported to userspace.
This breaks dumb string comparisons on these table lines in generic
userspace device-mapper library code and leads to tables being reloaded
repeatedly when nothing is actually meant to be changing.
This patch corrects this by no longer changing the table line when
discard passdown was disabled.
We can still tell when discard passdown is overridden by looking for the
message "Discard unsupported by data device (sdX): Disabling discard passdown."
This automatic detection is also moved from the 'load' to the 'resume'
so that it is re-evaluated should the properties of underlying devices
change.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Pull one more md bugfix from NeilBrown:
"Fix bug in recent fix to RAID10.
Without this patch, recovery will crash"
* tag 'md-3.4-fixes' of git://neil.brown.name/md:
md/raid10: fix transcription error in calc_sectors conversion.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile tree bugfix from Chris Metcalf:
"This fixes a security vulnerability (and correctness bug) in tilegx"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tilegx: enable SYSCALL_WRAPPERS support
|
|
The old code was
sector_div(stride, fc);
the new code was
sector_dir(size, conf->near_copies);
'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Merge misc fixes from Andrew Morton.
* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
frv: delete incorrect task prototypes causing compile fail
slub: missing test for partial pages flush work in flush_all()
fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
|
|
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.
Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around. Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.
Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a28784028e: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds. And it
just makes sense to update i_mode in the same place we update i_uid/gid.
Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead. However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor. We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
commit c57b5468406 (pktgen: fix crash at module unload) did a very poor
job with list primitives.
1) list_splice() arguments were in the wrong order
2) list_splice(list, head) has undefined behavior if head is not
initialized.
3) We should use the list_splice_init() variant to clear pktgen_threads
list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values. The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.
On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6. The general issue is tracked as CVE-2009-0029.
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull a machine check recovery fix from Tony Luck.
I really don't like how the MCE code does some of the things it does,
but this does seem to be an improvement.
* tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
x86/mce: Only restart instruction after machine check recovery if it is safe
|
|
Commit 41101809a865 ("fork: Provide weak arch_release_[task_struct|
thread_info] functions") in -tip highlights a problem in the frv arch,
where it has needles prototypes for alloc_task_struct_node and
free_task_struct. This now shows up as:
kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration
since that commit turned them into real functions. Since arch/frv does
does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e. it just
uses the generic ones) it shouldn't list these at all.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I found some kernel messages such as:
SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
Pid: 6143, comm: mdadm Tainted: G O 3.4.0-rc6+ #75
Call Trace:
kmem_cache_destroy+0x328/0x400
free_conf+0x2d/0xf0 [raid456]
stop+0x41/0x60 [raid456]
md_stop+0x1a/0x60 [md_mod]
do_md_stop+0x74/0x470 [md_mod]
md_ioctl+0xff/0x11f0 [md_mod]
blkdev_ioctl+0xd8/0x7a0
block_ioctl+0x3b/0x40
do_vfs_ioctl+0x96/0x560
sys_ioctl+0x91/0xa0
system_call_fastpath+0x16/0x1b
Then using kmemleak I found these messages:
unreferenced object 0xffff8800b6db7380 (size 112):
comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
hex dump (first 32 bytes):
01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff .....N..........
ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff .........@J.....
backtrace:
kmemleak_alloc+0x21/0x50
kmem_cache_alloc+0xeb/0x1b0
kmem_cache_open+0x2f1/0x430
kmem_cache_create+0x158/0x320
setup_conf+0x649/0x770 [raid456]
run+0x68b/0x840 [raid456]
md_run+0x529/0x940 [md_mod]
do_md_run+0x18/0xc0 [md_mod]
md_ioctl+0xba8/0x11f0 [md_mod]
blkdev_ioctl+0xd8/0x7a0
block_ioctl+0x3b/0x40
do_vfs_ioctl+0x96/0x560
sys_ioctl+0x91/0xa0
system_call_fastpath+0x16/0x1b
This bug was introduced by commit a8364d5555b ("slub: only IPI CPUs that
have per cpu obj to flush"), which did not include checks for per cpu
partial pages being present on a cpu.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc4-24406-g841e6a6 #121 Not tainted
-------------------------------------------------------
bash/1556 is trying to acquire lock:
(&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1
but task is already holding lock:
(&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
validate_chain+0x444/0x4f4
__lock_acquire+0x387/0x3f8
lock_acquire+0x12b/0x158
__mutex_lock_common+0x56/0x3a9
mutex_lock_killable_nested+0x40/0x45
lock_trace+0x24/0x59
proc_map_files_lookup+0x5a/0x165
__lookup_hash+0x52/0x73
do_lookup+0x276/0x2b1
walk_component+0x3d/0x114
do_last+0xfc/0x540
path_openat+0xd3/0x306
do_filp_open+0x3d/0x89
do_sys_open+0x74/0x106
sys_open+0x21/0x23
tracesys+0xdd/0xe2
-> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
check_prev_add+0x6a/0x1ef
validate_chain+0x444/0x4f4
__lock_acquire+0x387/0x3f8
lock_acquire+0x12b/0x158
__mutex_lock_common+0x56/0x3a9
mutex_lock_nested+0x40/0x45
do_lookup+0x267/0x2b1
walk_component+0x3d/0x114
link_path_walk+0x1f9/0x48f
path_openat+0xb6/0x306
do_filp_open+0x3d/0x89
open_exec+0x25/0xa0
do_execve_common+0xea/0x2f9
do_execve+0x43/0x45
sys_execve+0x43/0x5a
stub_execve+0x6c/0xc0
This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.
Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The reset date of the ST Micro version of PL031 is 2000-01-01. The
correct weekday for 2000-01-01 is saturday, but pl031 is initialized to
sunday. This may lead to alarm malfunction, so configure the correct
wday if RTC_DR indicates reset.
Signed-off-by: Rajkumar Kasirajan <rajkumar.kasirajan@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Mattias Wallin <mattias.wallin@stericsson.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull ARM fixes from Russell King:
"Small set of fixes again."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
|
|
Pull two networking fixes from David S. Miller:
1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
been present in do_tcp_sendpages() since that function was written in
2002.
When we block to wait for memory we have to unconditionally try and
push out pending TCP data, otherwise we can block for an unreasonably
long amount of time.
2) Fix deadlock in e1000, fixes kernel bugzilla 43132
From Tushar Dave.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
e1000: Prevent reset task killing itself.
tcp: do_tcp_sendpages() must try to push data out on oom conditions
|
|
Commit 1cc0c998fdf2 ("ACPI: Fix D3hot v D3cold confusion") introduced a
bug in __acpi_bus_set_power() and changed the behavior of
acpi_pci_set_power_state() in such a way that it generally doesn't work
as expected if PCI_D3hot is passed to it as the second argument.
First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
__acpi_bus_set_power() and the explicit_set flag is set for the D3cold
state, the function will try to execute AML method called "_PS4", which
doesn't exist.
Fix this by adding a check to ensure that the name of the AML method
to execute for transitions to ACPI_STATE_D3_COLD is correct in
__acpi_bus_set_power(). Also make sure that the explicit_set flag
for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
acpi_power_transition() to avoid accessing power resources for
ACPI_STATE_D3_COLD, because they don't exist.
Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
target state, the function will request a transition to
ACPI_STATE_D3_HOT instead of ACPI_STATE_D3. However,
ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
method is defined for the given device, which is rare. This causes
problems to happen on systems where devices were successfully put
into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
now. In particular, some unused graphics adapters are not turned
off as a result.
To fix this issue restore the old behavior of
acpi_pci_set_power_state(), which is to request a transition to
ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
PCI_D3cold is passed to it as the argument.
This approach is not ideal, because generally power should not
be removed from devices if PCI_D3hot is the target power state,
but since this behavior is relied on, we have no choice but to
restore it at the moment and spend more time on designing a
better solution in the future.
References: https://bugzilla.kernel.org/show_bug.cgi?id=43228
Reported-by: rocko <rockorequin@hotmail.com>
Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reported-and-tested-by: Peter <lekensteyn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Killing reset task while adapter is resetting causes deadlock.
Only kill reset task if adapter is not resetting.
Ref bug #43132 on bugzilla.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.
Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.
The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.
After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.
The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.
This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull two more target-core updates from Nicholas Bellinger:
"The first patch addresses a SPC-2 reservations RELEASE bug in a
special (iscsi specific) multi-ISID setup case that was allowing the
same initiator to be able to incorrect release it's own reservation on
a different SCSI path with enforce_pr_isid=1 operation. This bug was
caught by Bernhard Kohl.
The second patch is to address a bug with FILEIO backends where the
incorrect number of blocks for READ_CAPACITY was being reported after
an underlying device-mapper block_device size change. This patch uses
now i_size_read() in fd_get_blocks() for FILEIO backends with an
underlying block_device, instead of trying to determine this value at
setup time during fd_create_virtdevice(). (hch CC'ed)
Both are CC'ed to stable."
* '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix bug in handling of FILEIO + block_device resize ops
target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups
|
|
This patch fixes a bug in the handling of FILEIO w/ underlying block_device
resize operations where the original fd_dev->fd_dev_size was incorrectly being
used in fd_get_blocks() for READ_CAPACITY response payloads.
This patch avoids using fd_dev->fd_dev_size for FILEIO devices with
an underlying block_device, and instead changes fd_get_blocks() to
get the sector count directly from i_size_read() as recommended by hch.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
Pull slave-dmaengine fixes fromVinod Koul:
"fixes of cylic dma usages in slave dma drivers"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: fix cyclic dma usage
dmaengine: pl330: dont complete descriptor for cyclic dma
|
|
Pull last minute virtio fixes from Michael S. Tsirkin:
"Here are a couple of last minute virtio fixes for 3.4. Hope it's not
too late yes - I might have tried too hard to make sure the fix is
well tested.
Fixes are by Amit and myself. One fixes module removal and one
suspend of a VM, the last one the handling of out of memory condition.
They are thus very low risk as most people never hit these paths, but
do fix very annoying problems for people that do use the feature.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_net: invoke softirqs after __napi_schedule
virtio: balloon: let host know of updated balloon size before module removal
virtio: console: tell host of open ports after resume from s3/s4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM: SoC fixes from Olof Johansson:
"I will stop trying to predict when we're done with fixes for a
release.
Here's another small batch of three patches for arm-soc:
- A fix for a boot time WARN_ON() due to irq domain conversion on
PRIMA2
- Fix for a regression in Tegra SMP spinup code due to swapped
register offsets
- Fixed config dependency for mv_cesa crypto driver to avoid build
breakage"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller
crypto: mv_cesa requires on CRYPTO_HASH to build
ARM: tegra: Fix flow controller accesses
|
|
Pull two md fixes from NeilBrown:
"One fixes a bug in the new raid10 resize code so is relevant to 3.4
only.
The other fixes a bug in the use of md by dm-raid, so is relevant to
any kernel with dm-raid support"
* tag 'md-3.4-fixes' of git://neil.brown.name/md:
MD: Add del_timer_sync to mddev_suspend (fix nasty panic)
md/raid10: set dev_sectors properly when resizing devices in array.
|
|
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf, x86 and scheduler updates from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing: Do not enable function event with enable
perf stat: handle ENXIO error for perf_event_open
perf: Turn off compiler warnings for flex and bison generated files
perf stat: Fix case where guest/host monitoring is not supported by kernel
perf build-id: Fix filename size calculation
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
x86: Fix section annotation of acpi_map_cpu2node()
x86/microcode: Ensure that module is only loaded on supported Intel CPUs
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
|
|
Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits
on entry to sig handler") flushes the VFP state prior to entering a
signal handler so that a VFP operation inside the handler will trap and
force a restore of ABI-compliant registers. Reflushing and disabling VFP
on the sigreturn path is predicated on the saved thread state indicating
that VFP was used by the handler -- however for SMP platforms this is
only set on context-switch, making the check unreliable and causing VFP
register corruption in userspace since the register values are not
necessarily those restored from the sigframe.
This patch unconditionally flushes the VFP state after a signal handler.
Since we already perform the flush before the handler and the flushing
itself happens lazily, the redundant flush when VFP is not used by the
handler is essentially a nop.
Reported-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
A zero value for prot_sect in the memory types table implies that
section mappings should never be created for the memory type in question.
This is checked for in alloc_init_section().
With LPAE, we set a bit to mask access flag faults for kernel mappings.
This breaks the aforementioned (!prot_sect) check in alloc_init_section().
This patch fixes this bug by first checking for a non-zero
prot_sect before setting the PMD_SECT_AF flag.
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
__napi_schedule might raise softirq but nothing
causes do_softirq to trigger, so it does not in fact
run. As a result,
the error message "NOHZ: local_softirq_pending 08"
sometimes occurs during boot of a KVM guest when the network service is
started and we are oom:
...
Bringing up loopback interface: [ OK ]
Bringing up interface eth0:
Determining IP information for eth0...NOHZ: local_softirq_pending 08
done.
[ OK ]
...
Further, receive queue processing might get delayed
indefinitely until some interrupt triggers:
virtio_net expected napi to be run immediately.
One way to cause do_softirq to be executed is by
invoking local_bh_enable(). As __napi_schedule is
normally called from bh or irq context, this
seems to make sense: disable bh before __napi_schedule
and enable afterwards.
In fact it's a very complicated way of calling do_softirq(),
and works since this function is only used when we are not
in interrupt context. It's not hot at all, in any ideal scenario.
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Tested-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
When the balloon module is removed, we deflate the balloon, reclaiming
all the pages that were given to the host. However, we don't update the
config values for the new balloon size, resulting in the host showing
outdated balloon values.
The size update is done after each leak and fill operation, only the
module removal case was left out.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
If a port was open before going into one of the sleep states, the port
can continue normal operation after restore. However, the host has to
be told that the guest side of the connection is open to restore
pre-suspend state.
This wasn't noticed so far due to a bug in qemu that was fixed recently
(which marked the guest-side connection as always open).
CC: stable@vger.kernel.org # Only for 3.3
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
the old codes will cause 3.4 kernel warning as irq domain size is wrong:
------------[ cut here ]------------
WARNING: at kernel/irq/irqdomain.c:74 irq_domain_legacy_revmap+0x24/0x48()
Modules linked in:
[<c0013f50>] (unwind_backtrace+0x0/0xf8) from [<c001e7d8>] (warn_slowpath_common+0x54/0x64)
[<c001e7d8>] (warn_slowpath_common+0x54/0x64) from [<c001e804>] (warn_slowpath_null+0x1c/0x24)
[<c001e804>] (warn_slowpath_null+0x1c/0x24) from [<c005c3c4>] (irq_domain_legacy_revmap+0x24/0x48)
[<c005c3c4>] (irq_domain_legacy_revmap+0x24/0x48) from [<c005c704>] (irq_create_mapping+0x20/0x120)
[<c005c704>] (irq_create_mapping+0x20/0x120) from [<c005c880>] (irq_create_of_mapping+0x7c/0xf0)
[<c005c880>] (irq_create_of_mapping+0x7c/0xf0) from [<c01a6c48>] (irq_of_parse_and_map+0x2c/0x34)
[<c01a6c48>] (irq_of_parse_and_map+0x2c/0x34) from [<c01a6c68>] (of_irq_to_resource+0x18/0x74)
[<c01a6c68>] (of_irq_to_resource+0x18/0x74) from [<c01a6ce8>] (of_irq_count+0x24/0x34)
[<c01a6ce8>] (of_irq_count+0x24/0x34) from [<c01a7220>] (of_device_alloc+0x58/0x158)
[<c01a7220>] (of_device_alloc+0x58/0x158) from [<c01a735c>] (of_platform_device_create_pdata+0x3c/0x80)
[<c01a735c>] (of_platform_device_create_pdata+0x3c/0x80) from [<c01a7468>] (of_platform_bus_create+0xc8/0x190)
[<c01a7468>] (of_platform_bus_create+0xc8/0x190) from [<c01a74cc>] (of_platform_bus_create+0x12c/0x190)
---[ end trace 1b75b31a2719ed32 ]---
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
Use del_timer_sync to remove timer before mddev_suspend finishes.
We don't want a timer going off after an mddev_suspend is called. This is
especially true with device-mapper, since it can call the destructor function
immediately following a suspend. This results in the removal (kfree) of the
structures upon which the timer depends - resulting in a very ugly panic.
Therefore, we add a del_timer_sync to mddev_suspend to prevent this.
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
raid10 stores dev_sectors in 'conf' separately from the one in
'mddev' because it can have a very significant effect on block
addressing and so need to be updated carefully.
However raid10_resize isn't updating it at all!
To update it correctly, we need to make sure it is a proper
multiple of the chunksize taking various details of the layout
in to account.
This calculation is currently done in setup_conf. So split it
out from there and call it from raid10_resize as well.
Then set conf->dev_sectors properly.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Pull kvm powerpc fixes from Marcelo Tosatti:
"Urgent KVM PPC updates, quoting Alexander Graf:
There are a few bugs in 3.4 that really should be fixed before
people can be all happy and fuzzy about KVM on PowerPC. These fixes
are:
* fix POWER7 bare metal with PR=y
* fix deadlock on HV=y book3s_64 mode in low memory cases
* fix invalid MMU scope of PR=y mode on book3s_64, possibly eading
to memory corruption"
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updates
powerpc/kvm: Fix VSID usage in 64-bit "PR" KVM
KVM: PPC: Book3S: PR: Fix hsrr code
KVM: PPC: Fix PR KVM on POWER7 bare metal
KVM: PPC: Book3S: PR: Handle EMUL_ASSIST
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few last-minute regression fixes for 3.4 final kernel. All trivial,
and Cc'ed to stable kernel."
* tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8994: Fix AIF2ADC power down
ALSA: hda/idt - Fix power-map for speaker-pins with some HP laptops
ASoC: cs42l73: Sync digital mixer kcontrols to allow for 0dB
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc fix from Ohad Ben-Cohen:
"Fix a nasty off-by-one remoteproc bug which leaks memory when a remote
processor is shut down and, on certain circumstances, can indirectly
prevent it from being reloaded."
* tag 'rproc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc: fix off-by-one bug in __rproc_free_vrings
|