diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-22 13:59:43 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-02-22 13:59:43 -0800 |
| commit | 3a36281a17199737b468befb826d4a23eb774445 (patch) | |
| tree | 18bbdfcb0c30215883809f248d0334b7ea84d5ba /tools/perf/util/intel-pt-decoder | |
| parent | 7c70f3a7488d2fa62d32849d138bf2b8420fe788 (diff) | |
| parent | 3027ce36ccbae74f2e7c1afbfc3f69fee0c2a996 (diff) | |
Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"New features:
- Support instruction latency in 'perf report', with both memory
latency (weight) and instruction latency information, users can
locate expensive load instructions and understand time spent in
different stages.
- Extend 'perf c2c' to display the number of loads which were blocked
by data or address conflict.
- Add 'perf stat' support for L2 topdown events in systems such as
Intel's Sapphire rapids server.
- Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a
sort key, for instance:
perf report --stdio --sort=comm,symbol,code_page_size
- New 'perf daemon' command to run long running sessions while
providing a way to control the enablement of events without
restarting a traditional 'perf record' session.
- Enable counting events for BPF programs in 'perf stat' just like
for other targets (tid, cgroup, cpu, etc), e.g.:
# perf stat -e ref-cycles,cycles -b 254 -I 1000
1.487903822 115,200 ref-cycles
1.487903822 86,012 cycles
2.489147029 80,560 ref-cycles
2.489147029 73,784 cycles
^C
The example above counts 'cycles' and 'ref-cycles' of BPF program
of id 254. It is similar to bpftool-prog-profile command, but more
flexible.
- Support the new layout for PERF_RECORD_MMAP2 to carry the DSO
build-id using infrastructure generalised from the eBPF subsystem,
removing the need for traversing the perf.data file to collect
build-ids at the end of 'perf record' sessions and helping with
long running sessions where binaries can get replaced in updates,
leading to possible mis-resolution of symbols.
- Support filtering by hex address in 'perf script'.
- Support DSO filter in 'perf script', like in other perf tools.
- Add namespaces support to 'perf inject'
- Add support for SDT (Dtrace Style Markers) events on ARM64.
perf record:
- Fix handling of eventfd() when draining a buffer in 'perf record'.
- Improvements to the generation of metadata events for pre-existing
threads (mmaps, comm, etc), speeding up the work done at the start
of system wide or per CPU 'perf record' sessions.
Hardware tracing:
- Initial support for tracing KVM with Intel PT.
- Intel PT fixes for IPC
- Support Intel PT PSB (synchronization packets) events.
- Automatically group aux-output events to overcome --filter syntax.
- Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.
- Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.
perf annotate TUI:
- Fix handling of 'k' ("show line number") hotkey
- Fix jump parsing for C++ code.
perf probe:
- Add protection to avoid endless loop.
cgroups:
- Avoid reading cgroup mountpoint multiple times, caching it.
- Fix handling of cgroup v1/v2 in mixed hierarchy.
Symbol resolving:
- Add OCaml symbol demangling.
- Further fixes for handling PE executables when using perf with Wine
and .exe/.dll files.
- Fix 'perf unwind' DSO handling.
- Resolve symbols against debug file first, to deal with artifacts
related to LTO.
- Fix gap between kernel end and module start on powerpc.
Reporting tools:
- The DSO filter shouldn't show samples in unresolved maps.
- Improve debuginfod support in various tools.
build ids:
- Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test'
entry for that case.
perf test:
- Support for PERF_SAMPLE_WEIGHT_STRUCT.
- Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.
- Shell based tests for 'perf daemon's commands ('start', 'stop,
'reconfig', 'list', etc).
- ARM cs-etm 'perf test' fixes.
- Add parse-metric memory bandwidth testcase.
Compiler related:
- Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used
with -fpatchable-function-entry.
- Fix ARM64 build with gcc 11's -Wformat-overflow.
- Fix unaligned access in sample parsing test.
- Fix printf conversion specifier for IP addresses on arm64, s390 and
powerpc.
Arch specific:
- Support exposing Performance Monitor Counter SPRs as part of
extended regs on powerpc.
- Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn
DDR, fix imx8mm ones.
- Fix common and uarch events for ARM64's A76 and Ampere eMag"
* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits)
perf buildid-cache: Don't skip 16-byte build-ids
perf buildid-cache: Add test for 16-byte build-id
perf symbol: Remove redundant libbfd checks
perf test: Output the sub testing result in cs-etm
perf test: Suppress logs in cs-etm testing
perf tools: Fix arm64 build error with gcc-11
perf intel-pt: Add documentation for tracing virtual machines
perf intel-pt: Split VM-Entry and VM-Exit branches
perf intel-pt: Adjust sample flags for VM-Exit
perf intel-pt: Allow for a guest kernel address filter
perf intel-pt: Support decoding of guest kernel
perf machine: Factor out machine__idle_thread()
perf machine: Factor out machines__find_guest()
perf intel-pt: Amend decoder to track the NR flag
perf intel-pt: Retain the last PIP packet payload as is
perf intel_pt: Add vmlaunch and vmresume as branches
perf script: Add branch types for VM-Entry and VM-Exit
perf auxtrace: Automatically group aux-output events
perf test: Fix unaligned access in sample parsing test
perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
...
Diffstat (limited to 'tools/perf/util/intel-pt-decoder')
6 files changed, 301 insertions, 70 deletions
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 697513f35154..8c59677bee13 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -24,6 +24,13 @@ #include "intel-pt-decoder.h" #include "intel-pt-log.h" +#define BITULL(x) (1ULL << (x)) + +/* IA32_RTIT_CTL MSR bits */ +#define INTEL_PT_CYC_ENABLE BITULL(1) +#define INTEL_PT_CYC_THRESHOLD (BITULL(22) | BITULL(21) | BITULL(20) | BITULL(19)) +#define INTEL_PT_CYC_THRESHOLD_SHIFT 19 + #define INTEL_PT_BLK_SIZE 1024 #define BIT63 (((uint64_t)1 << 63)) @@ -55,6 +62,7 @@ enum intel_pt_pkt_state { INTEL_PT_STATE_TIP_PGD, INTEL_PT_STATE_FUP, INTEL_PT_STATE_FUP_NO_TIP, + INTEL_PT_STATE_FUP_IN_PSB, INTEL_PT_STATE_RESAMPLE, }; @@ -73,6 +81,7 @@ static inline bool intel_pt_sample_time(enum intel_pt_pkt_state pkt_state) case INTEL_PT_STATE_TIP_PGD: case INTEL_PT_STATE_FUP: case INTEL_PT_STATE_FUP_NO_TIP: + case INTEL_PT_STATE_FUP_IN_PSB: return false; default: return true; @@ -112,13 +121,14 @@ struct intel_pt_decoder { bool have_last_ip; bool in_psb; bool hop; - bool hop_psb_fup; bool leap; + bool nr; + bool next_nr; enum intel_pt_param_flags flags; uint64_t pos; uint64_t last_ip; uint64_t ip; - uint64_t cr3; + uint64_t pip_payload; uint64_t timestamp; uint64_t tsc_timestamp; uint64_t ref_timestamp; @@ -167,6 +177,8 @@ struct intel_pt_decoder { uint64_t sample_tot_cyc_cnt; uint64_t base_cyc_cnt; uint64_t cyc_cnt_timestamp; + uint64_t ctl; + uint64_t cyc_threshold; double tsc_to_cyc; bool continuous_period; bool overflow; @@ -189,6 +201,7 @@ struct intel_pt_decoder { int no_progress; int stuck_ip_prd; int stuck_ip_cnt; + uint64_t psb_ip; const unsigned char *next_buf; size_t next_len; unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ]; @@ -204,6 +217,14 @@ static uint64_t intel_pt_lower_power_of_2(uint64_t x) return x << i; } +static uint64_t intel_pt_cyc_threshold(uint64_t ctl) +{ + if (!(ctl & INTEL_PT_CYC_ENABLE)) + return 0; + + return (ctl & INTEL_PT_CYC_THRESHOLD) >> INTEL_PT_CYC_THRESHOLD_SHIFT; +} + static void intel_pt_setup_period(struct intel_pt_decoder *decoder) { if (decoder->period_type == INTEL_PT_PERIOD_TICKS) { @@ -245,12 +266,15 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params) decoder->flags = params->flags; + decoder->ctl = params->ctl; decoder->period = params->period; decoder->period_type = params->period_type; decoder->max_non_turbo_ratio = params->max_non_turbo_ratio; decoder->max_non_turbo_ratio_fp = params->max_non_turbo_ratio; + decoder->cyc_threshold = intel_pt_cyc_threshold(decoder->ctl); + intel_pt_setup_period(decoder); decoder->mtc_shift = params->mtc_period; @@ -481,6 +505,28 @@ static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder) decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX; } +static inline void intel_pt_update_pip(struct intel_pt_decoder *decoder) +{ + decoder->pip_payload = decoder->packet.payload; +} + +static inline void intel_pt_update_nr(struct intel_pt_decoder *decoder) +{ + decoder->next_nr = decoder->pip_payload & 1; +} + +static inline void intel_pt_set_nr(struct intel_pt_decoder *decoder) +{ + decoder->nr = decoder->pip_payload & 1; + decoder->next_nr = decoder->nr; +} + +static inline void intel_pt_set_pip(struct intel_pt_decoder *decoder) +{ + intel_pt_update_pip(decoder); + intel_pt_set_nr(decoder); +} + static int intel_pt_bad_packet(struct intel_pt_decoder *decoder) { intel_pt_clear_tx_flags(decoder); @@ -1218,6 +1264,7 @@ static int intel_pt_walk_tip(struct intel_pt_decoder *decoder) decoder->continuous_period = false; decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; decoder->state.type |= INTEL_PT_TRACE_END; + intel_pt_update_nr(decoder); return 0; } if (err == INTEL_PT_RETURN) @@ -1225,6 +1272,8 @@ static int intel_pt_walk_tip(struct intel_pt_decoder *decoder) if (err) return err; + intel_pt_update_nr(decoder); + if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) { if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) { decoder->pge = false; @@ -1337,6 +1386,7 @@ static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder) decoder->state.from_ip = decoder->ip; decoder->state.to_ip = decoder->last_ip; decoder->ip = decoder->last_ip; + intel_pt_update_nr(decoder); return 0; } @@ -1461,6 +1511,7 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder) { intel_pt_log("ERROR: Buffer overflow\n"); intel_pt_clear_tx_flags(decoder); + intel_pt_set_nr(decoder); decoder->timestamp_insn_cnt = 0; decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC; decoder->overflow = true; @@ -1735,18 +1786,14 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder) break; case INTEL_PT_PIP: - decoder->cr3 = decoder->packet.payload & (BIT63 - 1); + intel_pt_set_pip(decoder); break; case INTEL_PT_FUP: decoder->pge = true; if (decoder->packet.count) { intel_pt_set_last_ip(decoder); - if (decoder->hop) { - /* Act on FUP at PSBEND */ - decoder->ip = decoder->last_ip; - decoder->hop_psb_fup = true; - } + decoder->psb_ip = decoder->last_ip; } break; @@ -1761,6 +1808,9 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder) break; case INTEL_PT_CYC: + intel_pt_calc_cyc_timestamp(decoder); + break; + case INTEL_PT_VMCS: case INTEL_PT_MNT: case INTEL_PT_PAD: @@ -1835,6 +1885,7 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder) decoder->pge = false; decoder->continuous_period = false; decoder->state.type |= INTEL_PT_TRACE_END; + intel_pt_update_nr(decoder); return 0; case INTEL_PT_TIP_PGE: @@ -1850,6 +1901,7 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder) } decoder->state.type |= INTEL_PT_TRACE_BEGIN; intel_pt_mtc_cyc_cnt_pge(decoder); + intel_pt_set_nr(decoder); return 0; case INTEL_PT_TIP: @@ -1860,10 +1912,11 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder) intel_pt_set_ip(decoder); decoder->state.to_ip = decoder->ip; } + intel_pt_update_nr(decoder); return 0; case INTEL_PT_PIP: - decoder->cr3 = decoder->packet.payload & (BIT63 - 1); + intel_pt_update_pip(decoder); break; case INTEL_PT_MTC: @@ -1922,21 +1975,27 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in return HOP_IGNORE; case INTEL_PT_TIP_PGD: - if (!decoder->packet.count) + if (!decoder->packet.count) { + intel_pt_set_nr(decoder); return HOP_IGNORE; + } intel_pt_set_ip(decoder); decoder->state.type |= INTEL_PT_TRACE_END; decoder->state.from_ip = 0; decoder->state.to_ip = decoder->ip; + intel_pt_update_nr(decoder); return HOP_RETURN; case INTEL_PT_TIP: - if (!decoder->packet.count) + if (!decoder->packet.count) { + intel_pt_set_nr(decoder); return HOP_IGNORE; + } intel_pt_set_ip(decoder); decoder->state.type = INTEL_PT_INSTRUCTION; decoder->state.from_ip = decoder->ip; decoder->state.to_ip = 0; + intel_pt_update_nr(decoder); return HOP_RETURN; case INTEL_PT_FUP: @@ -1959,26 +2018,23 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in return HOP_RETURN; case INTEL_PT_PSB: + decoder->state.psb_offset = decoder->pos; + decoder->psb_ip = 0; decoder->last_ip = 0; decoder->have_last_ip = true; - decoder->hop_psb_fup = false; *err = intel_pt_walk_psbend(decoder); if (*err == -EAGAIN) return HOP_AGAIN; if (*err) return HOP_RETURN; - if (decoder->hop_psb_fup) { - decoder->hop_psb_fup = false; - decoder->state.type = INTEL_PT_INSTRUCTION; - decoder->state.from_ip = decoder->ip; - decoder->state.to_ip = 0; - return HOP_RETURN; + decoder->state.type = INTEL_PT_PSB_EVT; + if (decoder->psb_ip) { + decoder->state.type |= INTEL_PT_INSTRUCTION; + decoder->ip = decoder->psb_ip; } - if (decoder->cbr != decoder->cbr_seen) { - decoder->state.type = 0; - return HOP_RETURN; - } - return HOP_IGNORE; + decoder->state.from_ip = decoder->psb_ip; + decoder->state.to_ip = 0; + return HOP_RETURN; case INTEL_PT_BAD: case INTEL_PT_PAD: @@ -2012,8 +2068,151 @@ static int intel_pt_hop_trace(struct intel_pt_decoder *decoder, bool *no_tip, in } } +struct intel_pt_psb_info { + struct intel_pt_pkt fup_packet; + bool fup; + int after_psbend; +}; + +/* Lookahead and get the FUP packet from PSB+ */ +static int intel_pt_psb_lookahead_cb(struct intel_pt_pkt_info *pkt_info) +{ + struct intel_pt_psb_info *data = pkt_info->data; + + switch (pkt_info->packet.type) { + case INTEL_PT_PAD: + case INTEL_PT_MNT: + case INTEL_PT_TSC: + case INTEL_PT_TMA: + case INTEL_PT_MODE_EXEC: + case INTEL_PT_MODE_TSX: + case INTEL_PT_MTC: + case INTEL_PT_CYC: + case INTEL_PT_VMCS: + case INTEL_PT_CBR: + case INTEL_PT_PIP: + if (data->after_psbend) { + data->after_psbend -= 1; + if (!data->after_psbend) + return 1; + } + break; + + case INTEL_PT_FUP: + if (data->after_psbend) + return 1; + if (data->fup || pkt_info->packet.count == 0) + return 1; + data->fup_packet = pkt_info->packet; + data->fup = true; + break; + + case INTEL_PT_PSBEND: + if (!data->fup) + return 1; + /* Keep going to check for a TIP.PGE */ + data->after_psbend = 6; + break; + + case INTEL_PT_TIP_PGE: + /* Ignore FUP in PSB+ if followed by TIP.PGE */ + if (data->after_psbend) + data->fup = false; + return 1; + + case INTEL_PT_PTWRITE: + case INTEL_PT_PTWRITE_IP: + case INTEL_PT_EXSTOP: + case INTEL_PT_EXSTOP_IP: + case INTEL_PT_MWAIT: + case INTEL_PT_PWRE: + case INTEL_PT_PWRX: + case INTEL_PT_BBP: + case INTEL_PT_BIP: + case INTEL_PT_BEP: + case INTEL_PT_BEP_IP: + if (data->after_psbend) { + data->after_psbend -= 1; + if (!data->after_psbend) + return 1; + break; + } + return 1; + + case INTEL_PT_OVF: + case INTEL_PT_BAD: + case INTEL_PT_TNT: + case INTEL_PT_TIP_PGD: + case INTEL_PT_TIP: + case INTEL_PT_PSB: + case INTEL_PT_TRACESTOP: + default: + return 1; + } + + return 0; +} + +static int intel_pt_psb(struct intel_pt_decoder *decoder) +{ + int err; + + decoder->last_ip = 0; + decoder->psb_ip = 0; + decoder->have_last_ip = true; + intel_pt_clear_stack(&decoder->stack); + err = intel_pt_walk_psbend(decoder); + if (err) + return err; + decoder->state.type = INTEL_PT_PSB_EVT; + decoder->state.from_ip = decoder->psb_ip; + decoder->state.to_ip = 0; + return 0; +} + +static int intel_pt_fup_in_psb(struct intel_pt_decoder *decoder) +{ + int err; + + if (decoder->ip != decoder->last_ip) { + err = intel_pt_walk_fup(decoder); + if (!err || err != -EAGAIN) + return err; + } + + decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; + err = intel_pt_psb(decoder); + if (err) { + decoder->pkt_state = INTEL_PT_STATE_ERR3; + return -ENOENT; + } + + return 0; +} + +static bool intel_pt_psb_with_fup(struct intel_pt_decoder *decoder, int *err) +{ + struct intel_pt_psb_info data = { .fup = false }; + + if (!decoder->branch_enable || !decoder->pge) + return false; + + intel_pt_pkt_lookahead(decoder, intel_pt_psb_lookahead_cb, &data); + if (!data.fup) + return false; + + decoder->packet = data.fup_packet; + intel_pt_set_last_ip(decoder); + decoder->pkt_state = INTEL_PT_STATE_FUP_IN_PSB; + + *err = intel_pt_fup_in_psb(decoder); + + return true; +} + static int intel_pt_walk_trace(struct intel_pt_decoder *decoder) { + int last_packet_type = INTEL_PT_PAD; bool no_tip = false; int err; @@ -2022,6 +2221,12 @@ static int intel_pt_walk_trace(struct intel_pt_decoder *decoder) if (err) return err; next: + if (decoder->cyc_threshold) { + if (decoder->sample_cyc && last_packet_type != INTEL_PT_CYC) + decoder->sample_cyc = false; + last_packet_type = decoder->packet.type; + } + if (decoder->hop) { switch (intel_pt_hop_trace(decoder, &no_tip, &err)) { case HOP_IGNORE: @@ -2055,6 +2260,7 @@ next: case INTEL_PT_TIP_PGE: { decoder->pge = true; intel_pt_mtc_cyc_cnt_pge(decoder); + intel_pt_set_nr(decoder); if (decoder->packet.count == 0) { intel_pt_log_at("Skipping zero TIP.PGE", decoder->pos); @@ -2120,27 +2326,17 @@ next: break; case INTEL_PT_PSB: - decoder->last_ip = 0; - decoder->have_last_ip = true; - intel_pt_clear_stack(&decoder->stack); - err = intel_pt_walk_psbend(decoder); + decoder->state.psb_offset = decoder->pos; + decoder->psb_ip = 0; + if (intel_pt_psb_with_fup(decoder, &err)) + return err; + err = intel_pt_psb(decoder); if (err == -EAGAIN) goto next; - if (err) - return err; - /* - * PSB+ CBR will not have changed but cater for the - * possibility of another CBR change that gets caught up - * in the PSB+. - */ - if (decoder->cbr != decoder->cbr_seen) { - decoder->state.type = 0; - return 0; - } - break; + return err; case INTEL_PT_PIP: - decoder->cr3 = decoder->packet.payload & (BIT63 - 1); + intel_pt_update_pip(decoder); break; case INTEL_PT_MTC: @@ -2351,6 +2547,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder) uint64_t current_ip = decoder->ip; intel_pt_set_ip(decoder); + decoder->psb_ip = decoder->ip; if (current_ip) intel_pt_log_to("Setting IP", decoder->ip); @@ -2378,7 +2575,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder) break; case INTEL_PT_PIP: - decoder->cr3 = decoder->packet.payload & (BIT63 - 1); + intel_pt_set_pip(decoder); break; case INTEL_PT_MODE_EXEC: @@ -2497,7 +2694,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder) break; case INTEL_PT_PIP: - decoder->cr3 = decoder->packet.payload & (BIT63 - 1); + intel_pt_set_pip(decoder); break; case INTEL_PT_MODE_EXEC: @@ -2522,18 +2719,18 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder) break; case INTEL_PT_PSB: + decoder->state.psb_offset = decoder->pos; + decoder->psb_ip = 0; decoder->last_ip = 0; decoder->have_last_ip = true; intel_pt_clear_stack(&decoder->stack); err = intel_pt_walk_psb(decoder); if (err) return err; - if (decoder->ip) { - /* Do not have a sample */ - decoder->state.type = 0; - return 0; - } - break; + decoder->state.type = INTEL_PT_PSB_EVT; + decoder->state.from_ip = decoder->psb_ip; + decoder->state.to_ip = 0; + return 0; case INTEL_PT_TNT: case INTEL_PT_PSBEND: @@ -2577,7 +2774,7 @@ static int intel_pt_sync_ip(struct intel_pt_decoder *decoder) intel_pt_log("Scanning for full IP\n"); err = intel_pt_walk_to_ip(decoder); - if (err) + if (err || ((decoder->state.type & INTEL_PT_PSB_EVT) && !decoder->ip)) return err; /* In hop mode, resample to get the to_ip as an "instruction" sample */ @@ -2689,10 +2886,10 @@ static int intel_pt_sync(struct intel_pt_decoder *decoder) decoder->continuous_period = false; decoder->have_last_ip = false; decoder->last_ip = 0; + decoder->psb_ip = 0; decoder->ip = 0; intel_pt_clear_stack(&decoder->stack); -leap: err = intel_pt_scan_for_psb(decoder); if (err) return err; @@ -2704,8 +2901,11 @@ leap: if (err) return err; + decoder->state.type = INTEL_PT_PSB_EVT; /* Only PSB sample */ + decoder->state.from_ip = decoder->psb_ip; + decoder->state.to_ip = 0; + if (decoder->ip) { - decoder->state.type = 0; /* Do not have a sample */ /* * In hop mode, resample to get the PSB FUP ip as an * "instruction" sample. @@ -2714,14 +2914,6 @@ leap: decoder->pkt_state = INTEL_PT_STATE_RESAMPLE; else decoder->pkt_state = INTEL_PT_STATE_IN_SYNC; - } else if (decoder->leap) { - /* - * In leap mode, only PSB+ is decoded, so keeping leaping to the - * next PSB until there is an ip. - */ - goto leap; - } else { - return intel_pt_sync_ip(decoder); } return 0; @@ -2783,6 +2975,9 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder) if (err == -EAGAIN) err = intel_pt_walk_trace(decoder); break; + case INTEL_PT_STATE_FUP_IN_PSB: + err = intel_pt_fup_in_psb(decoder); + break; case INTEL_PT_STATE_RESAMPLE: err = intel_pt_resample(decoder); break; @@ -2797,6 +2992,7 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder) decoder->state.from_ip = decoder->ip; intel_pt_update_sample_time(decoder); decoder->sample_tot_cyc_cnt = decoder->tot_cyc_cnt; + intel_pt_set_nr(decoder); } else { decoder->state.err = 0; if (decoder->cbr != decoder->cbr_seen) { @@ -2811,14 +3007,30 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder) } if (intel_pt_sample_time(decoder->pkt_state)) { intel_pt_update_sample_time(decoder); - if (decoder->sample_cyc) + if (decoder->sample_cyc) { decoder->sample_tot_cyc_cnt = decoder->tot_cyc_cnt; + decoder->state.flags |= INTEL_PT_SAMPLE_IPC; + decoder->sample_cyc = false; + } } + /* + * When using only TSC/MTC to compute cycles, IPC can be + * sampled as soon as the cycle count changes. + */ + if (!decoder->have_cyc) + decoder->state.flags |= INTEL_PT_SAMPLE_IPC; } + /* Let PSB event always have TSC timestamp */ + if ((decoder->state.type & INTEL_PT_PSB_EVT) && decoder->tsc_timestamp) + decoder->sample_timestamp = decoder->tsc_timestamp; + + decoder->state.from_nr = decoder->nr; + decoder->state.to_nr = decoder->next_nr; + decoder->nr = decoder->next_nr; + decoder->state.timestamp = decoder->sample_timestamp; decoder->state.est_timestamp = intel_pt_est_timestamp(decoder); - decoder->state.cr3 = decoder->cr3; decoder->state.tot_insn_cnt = decoder->tot_insn_cnt; decoder->state.tot_cyc_cnt = decoder->sample_tot_cyc_cnt; diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h index 8645fc265481..d9e62a7f6f0e 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h @@ -17,6 +17,7 @@ #define INTEL_PT_ABORT_TX (1 << 1) #define INTEL_PT_ASYNC (1 << 2) #define INTEL_PT_FUP_IP (1 << 3) +#define INTEL_PT_SAMPLE_IPC (1 << 4) enum intel_pt_sample_type { INTEL_PT_BRANCH = 1 << 0, @@ -31,6 +32,7 @@ enum intel_pt_sample_type { INTEL_PT_TRACE_BEGIN = 1 << 9, INTEL_PT_TRACE_END = 1 << 10, INTEL_PT_BLK_ITEMS = 1 << 11, + INTEL_PT_PSB_EVT = 1 << 12, }; enum intel_pt_period_type { @@ -199,10 +201,11 @@ struct intel_pt_blk_items { struct intel_pt_state { enum intel_pt_sample_type type; + bool from_nr; + bool to_nr; int err; uint64_t from_ip; uint64_t to_ip; - uint64_t cr3; uint64_t tot_insn_cnt; uint64_t tot_cyc_cnt; uint64_t timestamp; @@ -213,6 +216,7 @@ struct intel_pt_state { uint64_t pwre_payload; uint64_t pwrx_payload; uint64_t cbr_payload; + uint64_t psb_offset; uint32_t cbr; uint32_t flags; enum intel_pt_insn_op insn_op; @@ -243,6 +247,7 @@ struct intel_pt_params { void *data; bool return_compression; bool branch_enable; + uint64_t ctl; uint64_t period; enum intel_pt_period_type period_type; unsigned max_non_turbo_ratio; diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c index fb8a3558d3d5..2f6cc7eea251 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c @@ -43,6 +43,17 @@ static void intel_pt_insn_decoder(struct insn *insn, switch (insn->opcode.bytes[0]) { case 0xf: switch (insn->opcode.bytes[1]) { + case 0x01: + switch (insn->modrm.bytes[0]) { + case 0xc2: /* vmlaunch */ + case 0xc3: /* vmresume */ + op = INTEL_PT_OP_VMENTRY; + branch = INTEL_PT_BR_INDIRECT; + break; + default: + break; + } + break; case 0x05: /* syscall */ case 0x34: /* sysenter */ op = INTEL_PT_OP_SYSCALL; @@ -213,6 +224,7 @@ const char *branch_name[] = { [INTEL_PT_OP_INT] = "Int", [INTEL_PT_OP_SYSCALL] = "Syscall", [INTEL_PT_OP_SYSRET] = "Sysret", + [INTEL_PT_OP_VMENTRY] = "VMentry", }; const char *intel_pt_insn_name(enum intel_pt_insn_op op) @@ -267,6 +279,9 @@ int intel_pt_insn_type(enum intel_pt_insn_op op) case INTEL_PT_OP_SYSRET: return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_SYSCALLRET; + case INTEL_PT_OP_VMENTRY: + return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | + PERF_IP_FLAG_VMENTRY; default: return 0; } diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h index 95a1eb0141ff..c2861cfdd768 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h @@ -24,6 +24,7 @@ enum intel_pt_insn_op { INTEL_PT_OP_INT, INTEL_PT_OP_SYSCALL, INTEL_PT_OP_SYSRET, + INTEL_PT_OP_VMENTRY, }; enum intel_pt_insn_branch { diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c index 4ce109993e74..02a3395d6ce3 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c @@ -16,8 +16,6 @@ #define BIT63 ((uint64_t)1 << 63) -#define NR_FLAG BIT63 - #if __BYTE_ORDER == __BIG_ENDIAN #define le16_to_cpu bswap_16 #define le32_to_cpu bswap_32 @@ -106,9 +104,7 @@ static int intel_pt_get_pip(const unsigned char *buf, size_t len, packet->type = INTEL_PT_PIP; memcpy_le64(&payload, buf + 2, 6); - packet->payload = payload >> 1; - if (payload & 1) - packet->payload |= NR_FLAG; + packet->payload = payload; return 8; } @@ -719,10 +715,10 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, name, (unsigned)(payload >> 1) & 1, (unsigned)payload & 1); case INTEL_PT_PIP: - nr = packet->payload & NR_FLAG ? 1 : 0; - payload &= ~NR_FLAG; + nr = packet->payload & INTEL_PT_VMX_NR_FLAG ? 1 : 0; + payload &= ~INTEL_PT_VMX_NR_FLAG; ret = snprintf(buf, buf_len, "%s 0x%llx (NR=%d)", - name, payload, nr); + name, payload >> 1, nr); return ret; case INTEL_PT_PTWRITE: return snprintf(buf, buf_len, "%s 0x%llx IP:0", name, payload); diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h index 17ca9b56d72f..996090cb84f6 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h +++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h @@ -21,6 +21,8 @@ #define INTEL_PT_PKT_MAX_SZ 16 +#define INTEL_PT_VMX_NR_FLAG 1 + enum intel_pt_pkt_type { INTEL_PT_BAD, INTEL_PT_PAD, |
