diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-10-03 12:47:28 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-10-03 12:47:28 -0700 |
commit | 12b7bcb43e6ea834ab2f5dc52d971e379a0ca109 (patch) | |
tree | 65218ee4792a1bae88feb75e615c5ec4e602c7a2 /tools/perf/builtin-annotate.c | |
parent | 00bcf5cdd6c0e2e92ce3dd852ca68a3b779fa4ec (diff) | |
parent | 41aad2a6d4fcdda8d73c9739daf7a9f3f49499d6 (diff) |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"The main kernel side changes were:
- uprobes enhancements (Masami Hiramatsu)
- Uncore group events enhancements (David Carrillo-Cisneros)
- x86 Intel: Add support for Skylake server uncore PMUs (Kan Liang)
- x86 Intel: LBR cleanups and enhancements, for better branch
annotation tracking (Peter Zijlstra)
- x86 Intel: Add support for PTWRITE and power event tracing
(Alexander Shishkin)
- ... various fixes, cleanups and smaller enhancements.
Lots of tooling changes - a couple of highlights:
- Support event group view with hierarchy mode in 'perf top' and
'perf report' (Namhyung Kim)
e.g.:
$ perf record -e '{cycles,instructions}' make
$ perf report --hierarchy --stdio
...
# Overhead Command / Shared Object / Symbol
# ...................... ..................................
...
25.74% 27.18%sh
19.96% 24.14%libc-2.24.so
9.55% 14.64%[.] __strcmp_sse2
1.54% 0.00%[.] __tfind
1.07% 1.13%[.] _int_malloc
0.95% 0.00%[.] __strchr_sse2
0.89% 1.39%[.] __tsearch
0.76% 0.00%[.] strlen
- Add branch stack / basic block info to 'perf annotate --stdio',
where for each branch, we add an asm comment after the instruction
with information on how often it was taken and predicted. See
example with color output at:
http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
(Peter Zijlstra)
- Add support for using symbols in address filters with Intel PT and
ARM CoreSight (hardware assisted tracing facilities) (Adrian
Hunter, Mathieu Poirier)
- Add support for interacting with Coresight PMU ETMs/PTMs, that are
IP blocks to perform hardware assisted tracing on a ARM CPU core
(Mathieu Poirier)
- Support generating cross arch probes, i.e. if you specify a vmlinux
file for different arch than the one in the host machine,
$ perf probe --definition function_name args
will generate the probe definition string needed to append to the
target machine /sys/kernel/debug/tracing/kprobes_events file, using
scripting (Masami Hiramatsu).
- Allow configuring the default 'perf report -s' sort order in
~/.perfconfig, for instance, "sym,dso" may be more fitting for
kernel developers. (Arnaldo Carvalho de Melo)
- ... plus lots of other changes, refactorings, features and fixes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (149 commits)
perf tests: Add dwarf unwind test for powerpc
perf probe: Match linkage name with mangled name
perf probe: Fix to cut off incompatible chars from group name
perf probe: Skip if the function address is 0
perf probe: Ignore the error of finding inline instance
perf intel-pt: Fix decoding when there are address filters
perf intel-pt: Enable decoder to handle TIP.PGD with missing IP
perf intel-pt: Read address filter from AUXTRACE_INFO event
perf intel-pt: Record address filter in AUXTRACE_INFO event
perf intel-pt: Add a helper function for processing AUXTRACE_INFO
perf intel-pt: Fix missing error codes processing auxtrace_info
perf intel-pt: Add support for recording the max non-turbo ratio
perf intel-pt: Fix snapshot overlap detection decoder errors
perf probe: Increase debug level of SDT debug messages
perf record: Add support for using symbols in address filters
perf symbols: Add dso__last_symbol()
perf record: Fix error paths
perf record: Rename label 'out_symbol_exit'
perf script: Fix vanished idle symbols
perf evsel: Add support for address filters
...
Diffstat (limited to 'tools/perf/builtin-annotate.c')
-rw-r--r-- | tools/perf/builtin-annotate.c | 111 |
1 files changed, 108 insertions, 3 deletions
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 9c1034d81b4f..ebb628332a6e 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -30,6 +30,7 @@ #include "util/tool.h" #include "util/data.h" #include "arch/common.h" +#include "util/block-range.h" #include <dlfcn.h> #include <linux/bitmap.h> @@ -46,6 +47,103 @@ struct perf_annotate { DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); }; +/* + * Given one basic block: + * + * from to branch_i + * * ----> * + * | + * | block + * v + * * ----> * + * from to branch_i+1 + * + * where the horizontal are the branches and the vertical is the executed + * block of instructions. + * + * We count, for each 'instruction', the number of blocks that covered it as + * well as count the ratio each branch is taken. + * + * We can do this without knowing the actual instruction stream by keeping + * track of the address ranges. We break down ranges such that there is no + * overlap and iterate from the start until the end. + * + * @acme: once we parse the objdump output _before_ processing the samples, + * we can easily fold the branch.cycles IPC bits in. + */ +static void process_basic_block(struct addr_map_symbol *start, + struct addr_map_symbol *end, + struct branch_flags *flags) +{ + struct symbol *sym = start->sym; + struct annotation *notes = sym ? symbol__annotation(sym) : NULL; + struct block_range_iter iter; + struct block_range *entry; + + /* + * Sanity; NULL isn't executable and the CPU cannot execute backwards + */ + if (!start->addr || start->addr > end->addr) + return; + + iter = block_range__create(start->addr, end->addr); + if (!block_range_iter__valid(&iter)) + return; + + /* + * First block in range is a branch target. + */ + entry = block_range_iter(&iter); + assert(entry->is_target); + entry->entry++; + + do { + entry = block_range_iter(&iter); + + entry->coverage++; + entry->sym = sym; + + if (notes) + notes->max_coverage = max(notes->max_coverage, entry->coverage); + + } while (block_range_iter__next(&iter)); + + /* + * Last block in rage is a branch. + */ + entry = block_range_iter(&iter); + assert(entry->is_branch); + entry->taken++; + if (flags->predicted) + entry->pred++; +} + +static void process_branch_stack(struct branch_stack *bs, struct addr_location *al, + struct perf_sample *sample) +{ + struct addr_map_symbol *prev = NULL; + struct branch_info *bi; + int i; + + if (!bs || !bs->nr) + return; + + bi = sample__resolve_bstack(sample, al); + if (!bi) + return; + + for (i = bs->nr - 1; i >= 0; i--) { + /* + * XXX filter against symbol + */ + if (prev) + process_basic_block(prev, &bi[i].from, &bi[i].flags); + prev = &bi[i].to; + } + + free(bi); +} + static int perf_evsel__add_sample(struct perf_evsel *evsel, struct perf_sample *sample, struct addr_location *al, @@ -72,6 +170,12 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, return 0; } + /* + * XXX filtered samples can still have branch entires pointing into our + * symbol and are missed. + */ + process_branch_stack(sample->branch_stack, al, sample); + sample->period = 1; sample->weight = 1; @@ -204,8 +308,6 @@ static int __cmd_annotate(struct perf_annotate *ann) struct perf_evsel *pos; u64 total_nr_samples; - machines__set_symbol_filter(&session->machines, symbol__annotate_init); - if (ann->cpu_list) { ret = perf_session__cpu_bitmap(session, ann->cpu_list, ann->cpu_bitmap); @@ -367,7 +469,10 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused) if (annotate.session == NULL) return -1; - symbol_conf.priv_size = sizeof(struct annotation); + ret = symbol__annotation_init(); + if (ret < 0) + goto out_delete; + symbol_conf.try_vmlinux_path = true; ret = symbol__init(&annotate.session->header.env); |