<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/tools/perf, branch bh1745</title>
<subtitle>Linux Kernel</subtitle>
<id>https://git.etezian.org/cgit.cgi/linux.git/atom?h=bh1745</id>
<link rel='self' href='https://git.etezian.org/cgit.cgi/linux.git/atom?h=bh1745'/>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/'/>
<updated>2023-02-24T01:09:35+00:00</updated>
<entry>
<title>Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-02-24T01:09:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-24T01:09:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=3822a7c40997dc86b1458766a3f146d62393f084'/>
<id>urn:sha1:3822a7c40997dc86b1458766a3f146d62393f084</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() &amp; fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove -&gt;rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
</content>
</entry>
<entry>
<title>perf tests stat_all_metrics: Change true workload to sleep workload for system wide check</title>
<updated>2023-02-17T20:22:56+00:00</updated>
<author>
<name>Kajol Jain</name>
<email>kjain@linux.ibm.com</email>
</author>
<published>2023-02-15T09:38:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=f9fa0778ee7349a9aa3d2ea10e9f2ab843a0b44e'/>
<id>urn:sha1:f9fa0778ee7349a9aa3d2ea10e9f2ab843a0b44e</id>
<content type='text'>
Testcase stat_all_metrics.sh fails in powerpc:

98: perf all metrics test : FAILED!

Logs with verbose:

  [command]# ./perf test 98 -vv
   98: perf all metrics test                                           :
   --- start ---
  test child forked, pid 13262
  Testing BRU_STALL_CPI
  Testing COMPLETION_STALL_CPI
   ----
  Testing TOTAL_LOCAL_NODE_PUMPS_P23
  Metric 'TOTAL_LOCAL_NODE_PUMPS_P23' not printed in:
  Error:
  Invalid event (hv_24x7/PM_PB_LNS_PUMP23,chip=3/) in per-thread mode, enable system wide with '-a'.
  Testing TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01
  Metric 'TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01' not printed in:
  Error:
  Invalid event (hv_24x7/PM_PB_RTY_LNS_PUMP01,chip=3/) in per-thread mode, enable system wide with '-a'.
   ----

Based on above logs, we could see some of the hv-24x7 metric events
fails, and logs suggest to run the metric event with -a option.  This
change happened after the commit a4b8cfcabb1d90ec ("perf stat: Delay
metric parsing"), which delayed the metric parsing phase and now before
metric parsing phase perf tool identifies, whether target is system-wide
or not. With this change, perf_event_open will fails with workload
monitoring for uncore events as expected.

The perf all metric test case fails as some of the hv-24x7 metric events
may need bigger workload with system wide monitoring to get the data.
Fix this issue by changing current system wide check from true workload
to sleep 0.01 workload.

Result with the patch changes in powerpc:

  98: perf all metrics test : Ok

Fixes: a4b8cfcabb1d90ec ("perf stat: Delay metric parsing")
Suggested-by: Ian Rogers &lt;irogers@google.com&gt;
Reviewed-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Signed-off-by: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Tested-by: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Tested-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Nageswara R Sastry &lt;rnsastry@linux.ibm.com&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230215093827.124921-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc</title>
<updated>2023-02-17T20:20:42+00:00</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2023-02-16T06:12:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=cf26e043c2a9213805d7ea9e8cf3e1d7166a62a4'/>
<id>urn:sha1:cf26e043c2a9213805d7ea9e8cf3e1d7166a62a4</id>
<content type='text'>
Power10 Performance Monitoring Unit (PMU) provides events to understand
stall cycles of different pipeline stages.  These events along with
completed instructions provides useful metrics for application tuning.

Patch implements the JSON changes to collect counter statistics to
present the high level CPI stall breakdown metrics. New metric group is
named as "CPI_STALL_RATIO" and this new metric group presents these
stall metrics:

- DISPATCHED_CPI ( Dispatch stall cycles per insn )
- ISSUE_STALL_CPI ( Issue stall cycles per insn )
- EXECUTION_STALL_CPI ( Execution stall cycles per insn )
- COMPLETION_STALL_CPI ( Completition stall cycles per insn )

To avoid multipling of events, PM_RUN_INST_CMPL event has been modified
to use PMC5(performance monitoring counter5) instead of PMC4. This
change is needed, since completion stall event is using PMC4.

Usage example:

 ./perf stat --metric-no-group -M CPI_STALL_RATIO &lt;workload&gt;

 Performance counter stats for 'workload':

    63,056,817,982      PM_CMPL_STALL                    #     0.28 COMPLETION_STALL_CPI
 1,743,988,038,896      PM_ISSUE_STALL                   #     7.73 ISSUE_STALL_CPI
   225,597,495,030      PM_RUN_INST_CMPL                 #     6.18 DISPATCHED_CPI
                                                  #    37.48 EXECUTION_STALL_CPI
 1,393,916,546,654      PM_DISP_STALL_CYC
 8,455,376,836,463      PM_EXEC_STALL

"--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
in all group for more accuracy.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Acked-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kajol Jain &lt;kjain@linux.ibm.com&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.vnet.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nageswara R Sastry &lt;rnsastry@linux.ibm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230216061240.18067-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf intel-pt: Synthesize cycle events</title>
<updated>2023-02-17T14:02:44+00:00</updated>
<author>
<name>Steinar H. Gunderson</name>
<email>sesse@google.com</email>
</author>
<published>2022-03-22T08:24:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=7e55b95651d88e60368087c243525a0d97d43d3d'/>
<id>urn:sha1:7e55b95651d88e60368087c243525a0d97d43d3d</id>
<content type='text'>
There is no good reason why we cannot synthesize "cycle" events from
Intel PT just as we can synthesize "instruction" events, in particular
when CYC packets are available. This enables using PT to getting much
more accurate cycle profiles than regular sampling (record -e cycles)
when the work last for very short periods (&lt;10 ms).  Thus, add support
for this, based off of the existing IPC calculation framework. The new
option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
and instruction events can be synthesized together, and are by default.

The only real caveat is that CYC packets are only emitted whenever some
other packet is, which in practice is when a branch instruction is
encountered (and not even all branches). Thus, even at no subsampling
(e.g. --itrace=y0ns), it is impossible to get more accuracy than a
single basic block, and all cycles spent executing that block will get
attributed to the branch instruction that ends the packet.  Thus, one
cannot know whether the cycles came from e.g. a specific load, a
mispredicted branch, or something else. When subsampling (which is the
default), the cycle events will get smeared out even more, but will
still be generally useful to attribute cycle counts to functions.

Reviewed-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Steinar H. Gunderson &lt;sesse@google.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf c2c: Add report option to show false sharing in adjacent cachelines</title>
<updated>2023-02-16T12:33:45+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@intel.com</email>
</author>
<published>2023-02-14T07:58:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=1470a108a60e8c0c4d19da10117c9b98f0078654'/>
<id>urn:sha1:1470a108a60e8c0c4d19da10117c9b98f0078654</id>
<content type='text'>
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.

0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.

So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.

In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):

  ----------------------------------------------------------------------
     26       31        2        0        0        0  0xffff888103ec6000
  ----------------------------------------------------------------------
   35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
    6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
   25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
   19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
    9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
    3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch

The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.

[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Committer notes:

Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx

Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.

Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Reviewed-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Signed-off-by: Feng Tang &lt;feng.tang@intel.com&gt;
Tested-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Acked-by: Joe Mario &lt;jmario@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Xing Zhengjun &lt;zhengjun.xing@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf record: Fix segfault with --overwrite and --max-size</title>
<updated>2023-02-15T12:58:55+00:00</updated>
<author>
<name>Yang Jihong</name>
<email>yangjihong1@huawei.com</email>
</author>
<published>2023-02-15T12:23:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=91621be65d6812cd74b2ea09573ff9ee0cbf5666'/>
<id>urn:sha1:91621be65d6812cd74b2ea09573ff9ee0cbf5666</id>
<content type='text'>
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:

  # perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  perf: Segmentation fault
  Obtained 12 stack frames.
  ./perf/perf(+0x197673) [0x55f99710b673]
  /lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
  ./perf/perf(+0x8eb40) [0x55f997002b40]
  ./perf/perf(+0x1f6882) [0x55f99716a882]
  ./perf/perf(+0x794c2) [0x55f996fed4c2]
  ./perf/perf(+0x7b7c7) [0x55f996fef7c7]
  ./perf/perf(+0x9074b) [0x55f99700474b]
  ./perf/perf(+0x12e23c) [0x55f9970a223c]
  ./perf/perf(+0x12e54a) [0x55f9970a254a]
  ./perf/perf(+0x7db60) [0x55f996ff1b60]
  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
  ./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
  Segmentation fault (core dumped)

backtrace of the core file is as follows:

  (gdb) bt
  #0  record__bytes_written (rec=0x55f99755a200 &lt;record&gt;) at builtin-record.c:234
  #1  record__output_max_size_exceeded (rec=0x55f99755a200 &lt;record&gt;) at builtin-record.c:242
  #2  record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 &lt;record&gt;) at builtin-record.c:263
  #3  process_synthesized_event (tool=tool@entry=0x55f99755a200 &lt;record&gt;, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
  #4  0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 &lt;record&gt;, process=process@entry=0x55f997002aa0 &lt;process_synthesized_event&gt;, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
      from=from@entry=0) at util/synthetic-events.c:1895
  #5  0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 &lt;record&gt;, process=process@entry=0x55f997002aa0 &lt;process_synthesized_event&gt;, evlist=&lt;optimized out&gt;, machine=machine@entry=0x55f997893658)
      at util/synthetic-events.c:1905
  #6  0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 &lt;record&gt;) at builtin-record.c:1997
  #7  0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 &lt;record&gt;) at builtin-record.c:2802
  #8  0x000055f99700474c in cmd_record (argc=&lt;optimized out&gt;, argv=0x7ffc67551260) at builtin-record.c:4258
  #9  0x000055f9970a223d in run_builtin (p=0x55f997564d88 &lt;commands+264&gt;, argc=10, argv=0x7ffc67551260) at perf.c:330
  #10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
  #11 0x000055f996ff1b61 in run_argv (argcp=&lt;synthetic pointer&gt;, argv=&lt;synthetic pointer&gt;) at perf.c:428
  #12 main (argc=&lt;optimized out&gt;, argv=0x7ffc67551260) at perf.c:562

The reason is that record__bytes_written accesses the freed memory rec-&gt;thread_data,
The process is as follows:
  __cmd_record
    -&gt; record__free_thread_data
      -&gt; zfree(&amp;rec-&gt;thread_data)         // free rec-&gt;thread_data
    -&gt; record__synthesize
      -&gt; perf_event__synthesize_id_index
        -&gt; process_synthesized_event
          -&gt; record__write
            -&gt; record__bytes_written      // access rec-&gt;thread_data

We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.

Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size")
Signed-off-by: Yang Jihong &lt;yangjihong1@huawei.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jiwei Sun &lt;jiwei.sun@windriver.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf stat: Avoid merging/aggregating metric counts twice</title>
<updated>2023-02-09T21:28:08+00:00</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2023-02-09T06:44:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=37f322cd58d81a9d46456531281c908de9ef6e42'/>
<id>urn:sha1:37f322cd58d81a9d46456531281c908de9ef6e42</id>
<content type='text'>
The added perf_stat_merge_counters combines uncore counters. When
metrics are enabled, the counts are merged into a metric_leader via the
stat-shadow saved_value logic. As the leader now is passed an aggregated
count, it leads to all counters being added together twice and counts
appearing approximately doubled in metrics.

This change disables the saved_value merging of counts for evsels that
are merged. It is recommended that later changes remove the saved_value
entirely as the two layers of aggregation in the code is confusing.

Fixes: 942c5593393d9418 ("perf stat: Add perf_stat_merge_counters()")
Reported-by: Perry Taylor &lt;perry.taylor@intel.com&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Florian Fischer &lt;florian.fischer@muhq.space&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Xing Zhengjun &lt;zhengjun.xing@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20230209064447.83733-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Fix perf tool build error in util/pfm.c</title>
<updated>2023-02-08T14:07:47+00:00</updated>
<author>
<name>Thomas Richter</name>
<email>tmricht@linux.ibm.com</email>
</author>
<published>2023-02-07T14:04:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=6a5558f1166473f741de33c32ffb161d7f7732cb'/>
<id>urn:sha1:6a5558f1166473f741de33c32ffb161d7f7732cb</id>
<content type='text'>
I have downloaded linux-next and build the perf tool using

  # make LIBPFM4=1

to have libpfm4 support built into perf. The build fails:

 # make LIBPFM4=1
....
INSTALL libbpf_headers
  CC      util/pfm.o
util/pfm.c: In function ‘print_libpfm_event’:
util/pfm.c:189:9: error: too many arguments to function ‘print_cb-&gt;print_event’
  189 |         print_cb-&gt;print_event(print_state,
      |         ^~~~~~~~
util/pfm.c:220:25: error: too many arguments to function ‘print_cb-&gt;print_event’
  220 |                         print_cb-&gt;print_event(print_state,

The build error is caused by commit d9dc8874d6ce46cc ("perf pmu-events:
Remove now unused event and metric variables") which changes the
function prototype of

  struct print_callbacks {
      ...
      void (*print_event)(...);  --&gt; last two parameters removed.
  };

but does not adjust the usage of this function prototype in util/pfm.c.
In file util/pfm.c function print_event() is still invoked with 13
parameters instead of 11. The compile fails.

When I adjust the file util/pfm.c as in this patch, the build works file.
Please check this patch for correctness, I have just fixed the compile
issue.

Fixes: d9dc8874d6ce46cc ("perf pmu-events: Remove now unused event and metric variables")
Signed-off-by: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Tested-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Sumanth Korikkar &lt;sumanthk@linux.ibm.com&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: egorenar@linux.ibm.com
Cc: linux-kernel-next@vger.kernel.org
Link: https://lore.kernel.org/r/20230207140447.1827741-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Fix auto-complete on aarch64</title>
<updated>2023-02-08T13:38:10+00:00</updated>
<author>
<name>Yicong Yang</name>
<email>yangyicong@hisilicon.com</email>
</author>
<published>2023-02-07T03:50:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=ffd1240e8f0814262ceb957dbe961f6e0aef1e7a'/>
<id>urn:sha1:ffd1240e8f0814262ceb957dbe961f6e0aef1e7a</id>
<content type='text'>
On aarch64 CPU related events are not under event_source/devices/cpu/events,
they're under event_source/devices/armv8_pmuv3_0/events on my machine.
Using current auto-complete script will generate below error:

  [root@localhost bin]# perf stat -e
  ls: cannot access '/sys/bus/event_source/devices/cpu/events': No such file or directory

Fix this by not testing /sys/bus/event_source/devices/cpu/events on
aarch64 machine.

Fixes: 74cd5815d9af6e6c ("perf tool: Improve bash command line auto-complete for multiple events with comma")
Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Cc: prime.zeng@hisilicon.com
Link: https://lore.kernel.org/r/20230207035057.43394-1-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf lock contention: Support old rw_semaphore type</title>
<updated>2023-02-08T13:35:17+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-02-07T00:24:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.etezian.org/cgit.cgi/linux.git/commit/?id=1bece1351c653c3d36bf761513e21ac8428449b4'/>
<id>urn:sha1:1bece1351c653c3d36bf761513e21ac8428449b4</id>
<content type='text'>
The old kernel has a different type of the owner field in rwsem.  We can
check it using bpf_core_type_matches() builtin in clang but it also
needs its own version check since it's available on recent versions.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
</feed>
