summaryrefslogtreecommitdiff
path: root/kernel/events
AgeCommit message (Collapse)Author
2014-05-19perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()Peter Zijlstra
Alexander noticed that we use RCU iteration on rb->event_list but do not use list_{add,del}_rcu() to add,remove entries to that list, nor do we observe proper grace periods when re-using the entries. Merge ring_buffer_detach() into ring_buffer_attach() such that attaching to the NULL buffer is detaching. Furthermore, ensure that between any 'detach' and 'attach' of the same event we observe the required grace period, but only when strictly required. In effect this means that only ioctl(.request = PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the normal initial attach and final detach will not be delayed. This patch should, I think, do the right thing under all circumstances, the 'normal' cases all should never see the extra grace period, but the two cases: 1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a ring_buffer set, will now observe the required grace period between removing itself from the old and attaching itself to the new buffer. This case is 'simple' in that both buffers are present in perf_event_set_output() one could think an unconditional synchronize_rcu() would be sufficient; however... 2) an event that has a buffer attached, the buffer is destroyed (munmap) and then the event is attached to a new/different buffer using PERF_EVENT_IOC_SET_OUTPUT. This case is more complex because the buffer destruction does: ring_buffer_attach(.rb = NULL) followed by the ioctl() doing: ring_buffer_attach(.rb = foo); and we still need to observe the grace period between these two calls due to us reusing the event->rb_entry list_head. In order to make 2 happen we use Paul's latest cond_synchronize_rcu() call. Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19perf: Prevent false warning in perf_swevent_addJiri Olsa
The perf cpu offline callback takes down all cpu context events and releases swhash->swevent_hlist. This could race with task context software event being just scheduled on this cpu via perf_swevent_add while cpu hotplug code already cleaned up event's data. The race happens in the gap between the cpu notifier code and the cpu being actually taken down. Note that only cpu ctx events are terminated in the perf cpu hotplug code. It's easily reproduced with: $ perf record -e faults perf bench sched pipe while putting one of the cpus offline: # echo 0 > /sys/devices/system/cpu/cpu1/online Console emits following warning: WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0() Modules linked in: CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G W 3.14.0+ #256 Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008 0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005 0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046 ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001 Call Trace: [<ffffffff81665a23>] dump_stack+0x4f/0x7c [<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0 [<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0 [<ffffffff8111646a>] group_sched_in+0x6a/0x1f0 [<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0 [<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450 [<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0 [<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0 [<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460 [<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30 [<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100 [<ffffffff8166a7de>] __schedule+0x30e/0xad0 [<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560 [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70 [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70 [<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70 [<ffffffff816707f0>] retint_kernel+0x20/0x30 [<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90 [<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67 [<ffffffff81679321>] ? sysret_check+0x5/0x56 Fixing this by tracking the cpu hotplug state and displaying the WARN only if current cpu is initialized properly. Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: stable@vger.kernel.org Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19perf: Limit perf_event_attr::sample_period to 63 bitsPeter Zijlstra
Vince reported that using a large sample_period (one with bit 63 set) results in wreckage since while the sample_period is fundamentally unsigned (negative periods don't make sense) the way we implement things very much rely on signed logic. So limit sample_period to 63 bits to avoid tripping over this. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-07perf: Fix perf_event_init_context()Peter Zijlstra
perf_pin_task_context() can return NULL but perf_event_init_context() assumes it will not, correct this. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/20140505171428.GU26782@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-07perf: Fix race in removing an eventPeter Zijlstra
When removing a (sibling) event we do: raw_spin_lock_irq(&ctx->lock); perf_group_detach(event); raw_spin_unlock_irq(&ctx->lock); <hole> perf_remove_from_context(event); raw_spin_lock_irq(&ctx->lock); ... raw_spin_unlock_irq(&ctx->lock); Now, assuming the event is a sibling, it will be 'unreachable' for things like ctx_sched_out() because that iterates the groups->siblings, and we just unhooked the sibling. So, if during <hole> we get ctx_sched_out(), it will miss the event and not call event_sched_out() on it, leaving it programmed on the PMU. The subsequent perf_remove_from_context() call will find the ctx is inactive and only call list_del_event() to remove the event from all other lists. Hereafter we can proceed to free the event; while still programmed! Close this hole by moving perf_group_detach() inside the same ctx->lock region(s) perf_remove_from_context() has. The condition on inherited events only in __perf_event_exit_task() is likely complete crap because non-inherited events are part of groups too and we're tearing down just the same. But leave that for another patch. Most-likely-Fixes: e03a9a55b4e ("perf: Change close() semantics for group events") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu> Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-05Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM changes from Russell King: - Perf updates from Will Deacon: - Support for Qualcomm Krait processors (run perf on your phone!) - Support for Cortex-A12 (run perf stat on your FPGA!) - Support for perf_sample_event_took, allowing us to automatically decrease the sample rate if we can't handle the PMU interrupts quickly enough (run perf record on your FPGA!). - Basic uprobes support from David Long: This patch series adds basic uprobes support to ARM. It is based on patches developed earlier by Rabin Vincent. That approach of adding hooks into the kprobes instruction parsing code was not well received. This approach separates the ARM instruction parsing code in kprobes out into a separate set of functions which can be used by both kprobes and uprobes. Both kprobes and uprobes then provide their own semantic action tables to process the results of the parsing. - ARMv7M (microcontroller) updates from Uwe Kleine-König - OMAP DMA updates (recently added Vinod's Ack even though they've been sitting in linux-next for a few months) to reduce the reliance of omap-dma on the code in arch/arm. - SA11x0 changes from Dmitry Eremin-Solenikov and Alexander Shiyan - Support for Cortex-A12 CPU - Align support for ARMv6 with ARMv7 so they can cooperate better in a single zImage. - Addition of first AT_HWCAP2 feature bits for ARMv8 crypto support. - Removal of IRQ_DISABLED from various ARM files - Improved efficiency of virt_to_page() for single zImage - Patch from Ulf Hansson to permit runtime PM callbacks to be available for AMBA devices for suspend/resume as well. - Finally kill asm/system.h on ARM. * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (89 commits) dmaengine: omap-dma: more consolidation of CCR register setup dmaengine: omap-dma: move IRQ handling to omap-dma dmaengine: omap-dma: move register read/writes into omap-dma.c ARM: omap: dma: get rid of 'p' allocation and clean up ARM: omap: move dma channel allocation into plat-omap code ARM: omap: dma: get rid of errata global ARM: omap: clean up DMA register accesses ARM: omap: remove almost-const variables ARM: omap: remove references to disable_irq_lch dmaengine: omap-dma: cleanup errata 3.3 handling dmaengine: omap-dma: provide register read/write functions dmaengine: omap-dma: use cached CCR value when enabling DMA dmaengine: omap-dma: move barrier to omap_dma_start_desc() dmaengine: omap-dma: move clnk_ctrl setting to preparation functions dmaengine: omap-dma: improve efficiency loading C.SA/C.EI/C.FI registers dmaengine: omap-dma: consolidate clearing channel status register dmaengine: omap-dma: move CCR buffering disable errata out of the fast path dmaengine: omap-dma: provide register definitions dmaengine: omap-dma: consolidate setup of CCR dmaengine: omap-dma: consolidate setup of CSDP ...
2014-04-03Merge branch 'for-3.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot updates for cgroup: - The biggest one is cgroup's conversion to kernfs. cgroup took after the long abandoned vfs-entangled sysfs implementation and made it even more convoluted over time. cgroup's internal objects were fused with vfs objects which also brought in vfs locking and object lifetime rules. Naturally, there are places where vfs rules don't fit and nasty hacks, such as credential switching or lock dance interleaving inode mutex and cgroup_mutex with object serial number comparison thrown in to decide whether the operation is actually necessary, needed to be employed. After conversion to kernfs, internal object lifetime and locking rules are mostly isolated from vfs interactions allowing shedding of several nasty hacks and overall simplification. This will also allow implmentation of operations which may affect multiple cgroups which weren't possible before as it would have required nesting i_mutexes. - Various simplifications including dropping of module support, easier cgroup name/path handling, simplified cgroup file type handling and task_cg_lists optimization. - Prepatory changes for the planned unified hierarchy, which is still a patchset away from being actually operational. The dummy hierarchy is updated to serve as the default unified hierarchy. Controllers which aren't claimed by other hierarchies are associated with it, which BTW was what the dummy hierarchy was for anyway. - Various fixes from Li and others. This pull request includes some patches to add missing slab.h to various subsystems. This was triggered xattr.h include removal from cgroup.h. cgroup.h indirectly got included a lot of files which brought in xattr.h which brought in slab.h. There are several merge commits - one to pull in kernfs updates necessary for converting cgroup (already in upstream through driver-core), others for interfering changes in the fixes branch" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits) cgroup: remove useless argument from cgroup_exit() cgroup: fix spurious lockdep warning in cgroup_exit() cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c cgroup: break kernfs active_ref protection in cgroup directory operations cgroup: fix cgroup_taskset walking order cgroup: implement CFTYPE_ONLY_ON_DFL cgroup: make cgrp_dfl_root mountable cgroup: drop const from @buffer of cftype->write_string() cgroup: rename cgroup_dummy_root and related names cgroup: move ->subsys_mask from cgroupfs_root to cgroup cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}() cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root cgroup: reorganize cgroup bootstrapping cgroup: relocate setting of CGRP_DEAD cpuset: use rcu_read_lock() to protect task_cs() cgroup_freezer: document freezer_fork() subtleties cgroup: update cgroup_transfer_tasks() to either succeed or fail cgroup: drop task_lock() protection around task->cgroups cgroup: update how a newly forked task gets associated with css_set ...
2014-03-18uprobes: allow ignoring of probe hitsDavid A. Long
Allow arches to decided to ignore a probe hit. ARM will use this to only call handlers if the conditions to execute a conditionally executed instruction are satisfied. Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2014-02-27perf: Optimize group_sched_in()Peter Zijlstra
Use the ctx pmu instead of the event pmu. When a group leader is a software event but the group contains hardware events, the entire group is on the hardware PMU. Using the hardware PMU for the transaction makes most sense since that's the most expensive one to programm (and software PMUs generally don't have TXN support anyway). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-sctoo9t2f3nn2c9g568928q3@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27perf: Remove redundant PMU assignmentMark Rutland
Currently perf_branch_stack_sched_in iterates over the set of pmus, checks that each pmu has a flush_branch_stack callback, then overwrites the pmu before calling the callback. This is either redundant or broken. In systems with a single hw pmu, pmu == cpuctx->ctx.pmu, and thus the assignment is redundant. In systems with multiple hw pmus (i.e. multiple pmus with task_ctx_nr == perf_hw_context) the pmus share the same perf_cpu_context. Thus the assignment can cause one of the pmus to flush its branch stack repeatedly rather than causing each of the pmus to flush their branch stacks. Worse still, if only some pmus have the callback the assignment can result in a branch to NULL. This patch removes the redundant assignment. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1392054264-23570-3-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27perf: Fix prototype of find_pmu_context()Mark Rutland
For some reason find_pmu_context() is defined as returning void * rather than a __percpu struct perf_cpu_context *. As all the requisite types are defined in advance there's no reason to keep it that way. This patch modifies the prototype of pmu_find_context to return a __percpu struct perf_cpu_context *. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1392054264-23570-2-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27Merge branch 'perf/urgent' into perf/coreIngo Molnar
Merge the latest fixes before queueing up new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27perf: Fix hotplug splatPeter Zijlstra
Drew Richardson reported that he could make the kernel go *boom* when hotplugging while having perf events active. It turned out that when you have a group event, the code in __perf_event_exit_context() fails to remove the group siblings from the context. We then proceed with destroying and freeing the event, and when you re-plug the CPU and try and add another event to that CPU, things go *boom* because you've still got dead entries there. Reported-by: Drew Richardson <drew.richardson@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-21perf/x86: Warn to early_printk() in case irq_work is too slowPeter Zijlstra
On Mon, Feb 10, 2014 at 08:45:16AM -0800, Dave Hansen wrote: > The reason I coded this up was that NMIs were firing off so fast that > nothing else was getting a chance to run. With this patch, at least the > printk() would come out and I'd have some idea what was going on. It will start spewing to early_printk() (which is a lot nicer to use from NMI context too) when it fails to queue the IRQ-work because its already enqueued. It does have the false-positive for when two CPUs trigger the warn concurrently, but that should be rare and some extra clutter on the early printk shouldn't be a problem. Cc: hpa@zytor.com Cc: tglx@linutronix.de Cc: dzickus@redhat.com Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: mingo@kernel.org Fixes: 6a02ad66b2c4 ("perf/x86: Push the duration-logging printk() to IRQ context") Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140211150116.GO27965@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-13cgroup: drop @skip_css from cgroup_taskset_for_each()Tejun Heo
If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching css. The intention of the interface is to make it easy to skip css's (cgroup_subsys_states) which already match the migration target; however, this is entirely unnecessary as migration taskset doesn't include tasks which are already in the target cgroup. Drop @skip_css from cgroup_taskset_for_each(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Daniel Borkmann <dborkman@redhat.com>
2014-02-11cgroup: improve css_from_dir() into css_tryget_from_dir()Tejun Heo
css_from_dir() returns the matching css (cgroup_subsys_state) given a dentry and subsystem. The function doesn't pin the css before returning and requires the caller to be holding RCU read lock or cgroup_mutex and handling pinning on the caller side. Given that users of the function are likely to want to pin the returned css (both existing users do) and that getting and putting css's are very cheap, there's no reason for the interface to be tricky like this. Rename css_from_dir() to css_tryget_from_dir() and make it try to pin the found css and return it only if pinning succeeded. The callers are updated so that they no longer do RCU locking and pinning around the function and just use the returned css. This will also ease converting cgroup to kernfs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2014-02-09perf/x86: Push the duration-logging printk() to IRQ contextPeter Zijlstra
Calling printk() from NMI context is bad (TM), so move it to IRQ context. This also avoids the problem where the printk() time is measured by the generic NMI duration goo and triggers a second warning. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/n/tip-75dv35xf6dhhmeb7nq6fua31@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-08cgroup: clean up cgroup_subsys names and initializationTejun Heo
cgroup_subsys is a bit messier than it needs to be. * The name of a subsys can be different from its internal identifier defined in cgroup_subsys.h. Most subsystems use the matching name but three - cpu, memory and perf_event - use different ones. * cgroup_subsys_id enums are postfixed with _subsys_id and each cgroup_subsys is postfixed with _subsys. cgroup.h is widely included throughout various subsystems, it doesn't and shouldn't have claim on such generic names which don't have any qualifier indicating that they belong to cgroup. * cgroup_subsys->subsys_id should always equal the matching cgroup_subsys_id enum; however, we require each controller to initialize it and then BUG if they don't match, which is a bit silly. This patch cleans up cgroup_subsys names and initialization by doing the followings. * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each cgroup_subsys with _cgrp_subsys. * With the above, renaming subsys identifiers to match the userland visible names doesn't cause any naming conflicts. All non-matching identifiers are renamed to match the official names. cpu_cgroup -> cpu mem_cgroup -> memory perf -> perf_event * controllers no longer need to initialize ->subsys_id and ->name. They're generated in cgroup core and set automatically during boot. * Redundant cgroup_subsys declarations removed. * While updating BUG_ON()s in cgroup_init_early(), convert them to WARN()s. BUGging that early during boot is stupid - the kernel can't print anything, even through serial console and the trap handler doesn't even link stack frame properly for back-tracing. This patch doesn't introduce any behavior changes. v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs classid handling into core"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Ingo Molnar <mingo@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Thomas Graf <tgraf@suug.ch>
2014-01-22Merge tag 'trace-3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a new feature to ftrace, namely the trace event triggers by Tom Zanussi. A trigger is a way to enable an action when an event is hit. The actions are: o trace on/off - enable or disable tracing o snapshot - save the current trace buffer in the snapshot o stacktrace - dump the current stack trace to the ringbuffer o enable/disable events - enable or disable another event Namhyung Kim added updates to the tracing uprobes code. Having the uprobes add support for fetch methods. The rest are various bug fixes with the new code, and minor ones for the old code" * tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits) tracing: Fix buggered tee(2) on tracing_pipe tracing: Have trace buffer point back to trace_array ftrace: Fix synchronization location disabling and freeing ftrace_ops ftrace: Have function graph only trace based on global_ops filters ftrace: Synchronize setting function_trace_op with ftrace_trace_function tracing: Show available event triggers when no trigger is set tracing: Consolidate event trigger code tracing: Fix counter for traceon/off event triggers tracing: Remove double-underscore naming in syscall trigger invocations tracing/kprobes: Add trace event trigger invocations tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT tracing/uprobes: Add @+file_offset fetch method uprobes: Allocate ->utask before handler_chain() for tracing handlers tracing/uprobes: Add support for full argument access methods tracing/uprobes: Fetch args before reserving a ring buffer tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg() tracing/probes: Implement 'memory' fetch method for uprobes tracing/probes: Add fetch{,_size} member into deref fetch method tracing/probes: Move 'symbol' fetch method to kprobes tracing/probes: Implement 'stack' fetch method for uprobes ...
2014-01-16Merge branch 'perf/urgent' into perf/coreIngo Molnar
Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12perf: Introduce a flag to enable close-on-exec in perf_event_open()Yann Droneaud
Unlike recent modern userspace API such as: epoll_create1 (EPOLL_CLOEXEC), eventfd (EFD_CLOEXEC), fanotify_init (FAN_CLOEXEC), inotify_init1 (IN_CLOEXEC), signalfd (SFD_CLOEXEC), timerfd_create (TFD_CLOEXEC), or the venerable general purpose open (O_CLOEXEC), perf_event_open() syscall lack a flag to atomically set FD_CLOEXEC (eg. close-on-exec) flag on file descriptor it returns to userspace. The present patch adds a PERF_FLAG_FD_CLOEXEC flag to allow perf_event_open() syscall to atomically set close-on-exec. Having this flag will enable userspace to remove the file descriptor from the list of file descriptors being inherited across exec, without the need to call fcntl(fd, F_SETFD, FD_CLOEXEC) and the associated race condition between the current thread and another thread calling fork(2) then execve(2). Links: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html - Notes in DMA buffer sharing: leak and security hole http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/dma-buf-sharing.txt?id=v3.13-rc3#n428 Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/8c03f54e1598b1727c19706f3af03f98685d9fe6.1388952061.git.ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12perf/x86: Fix active_entry initializationStephane Eranian
This patch fixes a problem with the initialization of the struct perf_event active_entry field. It is defined inside an anonymous union and was initialized in perf_event_alloc() using INIT_LIST_HEAD(). However at that time, we do not know whether the event is going to use active_entry or hlist_entry (SW). Or at last, we don't want to make that determination there. The problem is that hlist and list_head are not initialized the same way. One is okay with NULL (from kzmalloc), the other needs to pointers to point to self. This patch resolves this problem by dropping the union. This will avoid problems later on, if someone starts using active_entry or hlist_entry without verifying that they actually overlap. This also solves the initialization problem. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: ak@linux.intel.com Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: vincent.weaver@maine.edu Cc: maria.n.dimakopoulou@gmail.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389176153-3128-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-02uprobes: Allocate ->utask before handler_chain() for tracing handlersOleg Nesterov
uprobe_trace_print() and uprobe_perf_print() need to pass the additional info to call_fetch() methods, currently there is no simple way to do this. current->utask looks like a natural place to hold this info, but we need to allocate it before handler_chain(). This is a bit unfortunate, perhaps we will find a better solution later, but this is simple and should work right now. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2013-12-17perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the periodPeter Zijlstra
Vince Weaver reports that, on all architectures apart from ARM, PERF_EVENT_IOC_PERIOD doesn't actually update the period until the next event fires. This is counter-intuitive behaviour and is better dealt with in the core code. This patch ensures that the period is forcefully reset when dealing with such a request in the core code. A subsequent patch removes the equivalent hack from the ARM back-end. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1385560479-11014-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-17perf: Disable all pmus on unthrottling and reschedulingAlexander Shishkin
Currently, only one PMU in a context gets disabled during unthrottling and event_sched_{out,in}(), however, events in one context may belong to different pmus, which results in PMUs being reprogrammed while they are still enabled. This means that mixed PMU use [which is rare in itself] resulted in potentially completely unreliable results: corrupted events, bogus results, etc. This patch temporarily disables PMUs that correspond to each event in the context while these events are being modified. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: http://lkml.kernel.org/r/1387196256-8030-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-11perf: Optimize ring-buffer write by depending on control dependenciesPeter Zijlstra
Remove a full barrier from the ring-buffer write path by relying on a control dependency to order a LOAD -> STORE scenario. Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnxrjl@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-27perf: Add active_entry list head to struct perf_eventStephane Eranian
This patch adds a new field to the struct perf_event. It is intended to be used to chain events which are active (enabled). It helps in the hardware layer for PMUs which do not have actual counter restrictions, i.e., free running read-only counters. Active events are chained as opposed to being tracked via the counter they use. To save space we use a union with hlist_entry as both are mutually exclusive (suggested by Jiri Olsa). Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: acme@redhat.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1384275531-10892-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-21Merge branch 'uprobes/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes cleanups from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-20uprobes: Document xol_area and arch_uprobe->insn/ixolOleg Nesterov
Document xol_area and arch_uprobe. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-11-20uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_areaOleg Nesterov
1. Don't include asm/uprobes.h unconditionally, we only need it if CONFIG_UPROBES. 2. Move the definition of "struct xol_area" into uprobes.c. Perhaps we should simply kill struct uprobes_state, it buys nothing. 3. Kill the dummy definition of uprobe_get_swbp_addr(), nobody except handle_swbp() needs it. 4. Purely cosmetic, but move the decl of uprobe_get_swbp_addr() up, close to other __weak helpers. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-11-20uprobes: Don't assume that arch_uprobe->insn/ixol is u8[MAX_UINSN_BYTES]Oleg Nesterov
arch_uprobe should be opaque as much as possible to the generic code, but currently it assumes that insn/ixol must be u8[] of the known size. Remove this unnecessary dependency, we can use "&" and and sizeof() with the same effect. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-11-20uprobes: Add uprobe_task->dup_xol_work/dup_xol_addrOleg Nesterov
uprobe_task->vaddr is a bit strange. The generic code uses it only to pass the additional argument to arch_uprobe_pre_xol(), and since it is always equal to instruction_pointer() this looks even more strange. And both utask->vaddr and and utask->autask have the same scope, they only have the meaning when the task executes the probed insn out-of-line, so it is safe to reuse both in UTASK_RUNNING state. This all means that logically ->vaddr belongs to arch_uprobe_task and we should probably move it there, arch_uprobe_pre_xol() can record instruction_pointer() itself. OTOH, it is also used by uprobe_copy_process() and dup_xol_work() for another purpose, this doesn't look clean and doesn't allow to move this member into arch_uprobe_task. This patch adds the union with 2 anonymous structs into uprobe_task. The first struct is autask + vaddr, this way we "almost" move vaddr into autask. The second struct has 2 new members for uprobe_copy_process() paths: ->dup_xol_addr which can be used instead ->vaddr, and ->dup_xol_work which can be used to avoid kmalloc() and simplify the code. Note that this union will likely have another member(s), we need something like "private_data_for_handlers" so that the tracing handlers could use it to communicate with call_fetch() methods. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-11-19perf: Remove fragile swevent hlist optimizationPeter Zijlstra
Currently we only allocate a single cpu hashtable for per-cpu swevents; do away with this optimization for it is fragile in the face of things like perf_pmu_migrate_context(). The easiest thing is to make sure all CPUs are consistent wrt state. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130913111447.GN31370@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13list: introduce list_next_entry() and list_prev_entry()Oleg Nesterov
Add two trivial helpers list_next_entry() and list_prev_entry(), they can have a lot of users including list.h itself. In fact the 1st one is already defined in events/core.c and bnx2x_sp.c, so the patch simply moves the definition to list.h. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-12Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "As a first remark I'd like to note that the way to build perf tooling has been simplified and sped up, in the future it should be enough for you to build perf via: cd tools/perf/ make install (ie without the -j option.) The build system will figure out the number of CPUs and will do a parallel build+install. The various build system inefficiencies and breakages Linus reported against the v3.12 pull request should now be resolved - please (re-)report any remaining annoyances or bugs. Main changes on the perf kernel side: * Performance optimizations: . perf ring-buffer code optimizations, by Peter Zijlstra . perf ring-buffer code optimizations, by Oleg Nesterov . x86 NMI call-stack processing optimizations, by Peter Zijlstra . perf context-switch optimizations, by Peter Zijlstra . perf sampling speedups, by Peter Zijlstra . x86 Intel PEBS processing speedups, by Peter Zijlstra * Enhanced hardware support: . for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan . for Haswell transactions, by Andi Kleen, Peter Zijlstra * Core perf events code enhancements and fixes by Oleg Nesterov: . for uprobes, if fork() is called with pending ret-probes . for uprobes platform support code * New ABI details by Andi Kleen: . Report x86 Haswell TSX transaction abort cost as weight Main changes on the perf tooling side (some of these tooling changes utilize the above kernel side changes): * 'perf report/top' enhancements: . Convert callchain children list to rbtree, greatly reducing the time taken for callchain processing, from Namhyung Kim. . Add new COMM infrastructure, further improving histogram processing, from Frédéric Weisbecker, one fix from Namhyung Kim. . Add /proc/kcore based live-annotation improvements, including build-id cache support, multi map 'call' instruction navigation fixes, kcore address validation, objdump workarounds. From Adrian Hunter. . Show progress on histogram collapsing, that can take a long time, from Namhyung Kim. . Add --max-stack option to limit callchain stack scan in 'top' and 'report', improving callchain processing when reducing the stack depth is an option, from Waiman Long. . Add new option --ignore-vmlinux for perf top, from Willy Tarreau. * 'perf trace' enhancements: . 'perf trace' now can can use a 'perf probe' dynamic tracepoints to hook into the userspace -> kernel pathname copy so that it can map fds to pathnames without reading /proc/pid/fd/ symlinks. From Arnaldo Carvalho de Melo. . Show VFS path associated with fd in live sessions, using a 'vfs_getname' 'perf probe' created dynamic tracepoint or by looking at /proc/pid/fd, from Arnaldo Carvalho de Melo. . Add 'trace' beautifiers for lots of syscall arguments, from Arnaldo Carvalho de Melo. . Implement more compact 'trace' output by suppressing zeroed args, from Arnaldo Carvalho de Melo. . Show thread COMM by default in 'trace', from Arnaldo Carvalho de Melo. . Add option to show full timestamp in 'trace', from David Ahern. . Add 'record' command in 'trace', to record raw_syscalls:*, from David Ahern. . Add summary option to dump syscall statistics in 'trace', from David Ahern. . Improve error messages in 'trace', providing hints about system configuration steps needed for using it, from Ramkumar Ramachandra. . 'perf trace' now emits hints as to why tracing is not possible, helping the user to setup the system to allow tracing in the desired permission granularity, telling if the problem is due to debugfs not being mounted or with not enough permission for !root, /proc/sys/kernel/perf_event_paranoit value, etc. From Arnaldo Carvalho de Melo. * 'perf record' enhancements: . Check maximum frequency rate for record/top, emitting better error messages, from Jiri Olsa. . 'perf record' code cleanups, from David Ahern. . Improve write_output error message in 'perf record', from Adrian Hunter. . Allow specifying B/K/M/G unit to the --mmap-pages arguments, from Jiri Olsa. . Fix command line callchain attribute tests to handle the new -g/--call-chain semantics, from Arnaldo Carvalho de Melo. * 'perf kvm' enhancements: . Disable live kvm command if timerfd is not supported, from David Ahern. . Fix detection of non-core features, from David Ahern. * 'perf list' enhancements: . Add usage to 'perf list', from David Ahern. . Show error in 'perf list' if tracepoints not available, from Pekka Enberg. * 'perf probe' enhancements: . Support "$vars" meta argument syntax for local variables, allowing asking for all possible variables at a given probe point to be collected when it hits, from Masami Hiramatsu. * 'perf sched' enhancements: . Address the root cause of that 'perf sched' stack initialization build slowdown, by programmatically setting a big array after moving the global variable back to the stack. Fix from Adrian Hunter. * 'perf script' enhancements: . Set up output options for in-stream attributes, from Adrian Hunter. . Print addr by default for BTS in 'perf script', from Adrian Juntmer * 'perf stat' enhancements: . Improved messages when doing profiling in all or a subset of CPUs using a workload as the session delimitator, as in: 'perf stat --cpu 0,2 sleep 10s' from Arnaldo Carvalho de Melo. . Add units to nanosec-based counters in 'perf stat', from David Ahern. . Remove bogus info when using 'perf stat' -e cycles/instructions, from Ramkumar Ramachandra. * 'perf lock' enhancements: . 'perf lock' fixes and cleanups, from Davidlohr Bueso. * 'perf test' enhancements: . Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing and 'perf test', from Adrian Hunter. . Clarify the "sample parsing" test entry, from Arnaldo Carvalho de Melo. . Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test, from Arnaldo Carvalho de Melo. . Memory leak fixes in 'perf test', from Felipe Pena. * 'perf bench' enhancements: . Change the procps visible command-name of invididual benchmark tests plus cleanups, from Ingo Molnar. * Generic perf tooling infrastructure/plumbing changes: . Separating data file properties from session, code reorganization from Jiri Olsa. . Fix version when building out of tree, as when using one of these: $ make help | grep perf perf-tar-src-pkg - Build perf-3.12.0.tar source tarball perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball $ from David Ahern. . Enhance option parse error message, showing just the help lines of the options affected, from Namhyung Kim. . libtraceevent updates from upstream trace-cmd repo, from Steven Rostedt. . Always use perf_evsel__set_sample_bit to set sample_type, from Adrian Hunter. . Memory and mmap leak fixes from Chenggang Qin. . Assorted build fixes for from David Ahern and Jiri Olsa. . Speed up and prettify the build system, from Ingo Molnar. . Implement addr2line directly using libbfd, from Roberto Vitillo. . Separate the GTK support in a separate libperf-gtk.so DSO, that is only loaded when --gtk is specified, from Namhyung Kim. . perf bash completion fixes and improvements from Ramkumar Ramachandra. . Support for Openembedded/Yocto -dbg packages, from Ricardo Ribalda Delgado. And lots and lots of other fixes and code reorganizations that did not make it into the list, see the shortlog, diffstat and the Git log for details!" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits) uprobes: Fix the memory out of bound overwrite in copy_insn() uprobes: Fix the wrong usage of current->utask in uprobe_copy_process() perf tools: Remove unneeded include perf record: Remove post_processing_offset variable perf record: Remove advance_output function perf record: Refactor feature handling into a separate function perf trace: Don't relookup fields by name in each sample perf tools: Fix version when building out of tree perf evsel: Ditch evsel->handler.data field uprobes: Export write_opcode() as uprobe_write_opcode() uprobes: Introduce arch_uprobe->ixol uprobes: Kill module_init() and module_exit() uprobes: Move function declarations out of arch perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes perf: Factor out strncpy() in perf_event_mmap_event() tools/perf: Add required memory barriers perf: Fix arch_perf_out_copy_user default perf: Update a stale comment perf: Optimize perf_output_begin() -- address calculation ...
2013-11-09uprobes: Fix the memory out of bound overwrite in copy_insn()Oleg Nesterov
1. copy_insn() doesn't look very nice, all calculations are confusing and it is not immediately clear why do we read the 2nd page first. 2. The usage of inode->i_size is wrong on 32-bit machines. 3. "Instruction at end of binary" logic is simply wrong, it doesn't handle the case when uprobe->offset > inode->i_size. In this case "bytes" overflows, and __copy_insn() writes to the memory outside of uprobe->arch.insn. Yes, uprobe_register() checks i_size_read(), but this file can be truncated after that. All i_size checks are racy, we do this only to catch the obvious mistakes. Change copy_insn() to call __copy_insn() in a loop, simplify and fix the bytes/nbytes calculations. Note: we do not care if we read extra bytes after inode->i_size if we got the valid page. This is fine because the task gets the same page after page-fault, and arch_uprobe_analyze_insn() can't know how many bytes were actually read anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-09uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()Oleg Nesterov
Commit aa59c53fd459 "uprobes: Change uprobe_copy_process() to dup xol_area" has a stupid typo, we need to setup t->utask->vaddr but the code wrongly uses current->utask. Even with this bug dup_xol_work() works "in practice", but only because get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE) likely returns the same address every time. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-07Merge tag 'driver-core-3.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / sysfs patches from Greg KH: "Here's the big driver core / sysfs update for 3.13-rc1. There's lots of dev_groups updates for different subsystems, as they all get slowly migrated over to the safe versions of the attribute groups (removing userspace races with the creation of the sysfs files.) Also in here are some kobject updates, devres expansions, and the first round of Tejun's sysfs reworking to enable it to be used by other subsystems as a backend for an in-kernel filesystem. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits) sysfs: rename sysfs_assoc_lock and explain what it's about sysfs: use generic_file_llseek() for sysfs_file_operations sysfs: return correct error code on unimplemented mmap() mdio_bus: convert bus code to use dev_groups device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name sysfs: separate out dup filename warning into a separate function sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c sysfs: remove unused sysfs_get_dentry() prototype sysfs: honor bin_attr.attr.ignore_lockdep sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr devres: restore zeroing behavior of devres_alloc() sysfs: fix sysfs_write_file for bin file input: gameport: convert bus code to use dev_groups input: serio: remove bus usage of dev_attrs input: serio: use DEVICE_ATTR_RO() i2o: convert bus code to use dev_groups memstick: convert bus code to use dev_groups tifm: convert bus code to use dev_groups virtio: convert bus code to use dev_groups ipack: convert bus code to use dev_groups ...
2013-11-06uprobes: Export write_opcode() as uprobe_write_opcode()Oleg Nesterov
set_swbp() and set_orig_insn() are __weak, but this is pointless because write_opcode() is static. Export write_opcode() as uprobe_write_opcode() for the upcoming arm port, this way it can actually override set_swbp() and use __opcode_to_mem_arm(bpinsn) instead if UPROBE_SWBP_INSN. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-06uprobes: Introduce arch_uprobe->ixolOleg Nesterov
Currently xol_get_insn_slot() assumes that we should simply copy arch_uprobe->insn[] which is (ignoring arch_uprobe_analyze_insn) just the copy of the original insn. This is not true for arm which needs to create another insn to execute it out-of-line. So this patch simply adds the new member, ->ixol into the union. This doesn't make any difference for x86 and powerpc, but arm can divorce insn/ixol and initialize the correct xol insn in arch_uprobe_analyze_insn(). Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-06uprobes: Kill module_init() and module_exit()Oleg Nesterov
Turn module_init() into __initcall() and kill module_exit(). This code can't be compiled as a module so these module_*() calls only add the confusion, especially if arch-dependant code needs its own initialization hooks. Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-06perf: Factor out strncpy() in perf_event_mmap_event()Oleg Nesterov
While this is really minor, but strncpy() does the unnecessary zero-padding till the end of tmp[16] and it is called every time we are going to use the string literal. Turn these strncpy()'s into the single strlcpy() under the new label, saves 72 bytes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131017182417.GA17753@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Fix arch_perf_out_copy_user defaultPeter Zijlstra
The arch_perf_output_copy_user() default of __copy_from_user_inatomic() returns bytes not copied, while all other argument functions given DEFINE_OUTPUT_COPY() return bytes copied. Since copy_from_user_nmi() is the odd duck out by returning bytes copied where all other *copy_{to,from}* functions return bytes not copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes not copied. Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied while expecting its worker functions to return bytes copied. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: will.deacon@arm.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Update a stale commentPeter Zijlstra
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-9s5mze78gmlz19agt39i8rii@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin() -- address calculationPeter Zijlstra
Rewrite the handle address calculation code to be clearer. Saves 8 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-3trb2n2henb9m27tncef3ag7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin() -- lost_event casePeter Zijlstra
Avoid touching the lost_event and sample_data cachelines twince. Its not like we end up doing less work, but it might help to keep all accesses to these cachelines in one place. Due to code shuffle, this looses 4 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-zfxnc58qxj0eawdoj31hhupv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Optimize perf_output_begin()Peter Zijlstra
There's no point in re-doing the memory-barrier when we fail the cmpxchg(). Also placing it after the space reservation loop makes it clearer it only separates the userpage->tail read from the data stores. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-c19u6egfldyx86tpyc3zgkw9@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Add unlikely() to the ring-buffer codePeter Zijlstra
Add unlikely() annotations to 'slow' paths: When having a sampling event but no output buffer; you have bigger issues -- also the bail is still faster than actually doing the work. When having a sampling event but a control page only buffer, you have bigger issues -- again the bail is still faster than actually doing work. Optimize for the case where you're not loosing events -- again, not doing the work is still faster but make sure that when you have to actually do work its as fast as possible. The typical watermark is 1/2 the buffer size, so most events will not take this path. Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-wlg3jew3qnutm8opd0hyeuwn@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06perf: Simplify the ring-buffer codePeter Zijlstra
By using CIRC_SPACE() we can obviate the need for perf_output_space(). Shrinks the size of perf_output_begin() by 17 bytes on x86_64-defconfig. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: james.hogan@imgtec.com Cc: Vince Weaver <vince@deater.net> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Anton Blanchard <anton@samba.org> Link: http://lkml.kernel.org/n/tip-vtb0xb0llebmsdlfn1v5vtfj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04Merge branch 'perf/urgent' into perf/core to fix conflictsIngo Molnar
Conflicts: tools/perf/bench/numa.c Signed-off-by: Ingo Molnar <mingo@kernel.org>