“Perf”的版本间的差异
来自百问网嵌入式Linux wiki
第7行: | 第7行: | ||
==Yocto下安装== | ==Yocto下安装== | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | RRECOMMENDS_${PN} = "\ | ||
+ | ${PERF} \ | ||
+ | trace-cmd \ | ||
+ | blktrace \ | ||
+ | ${PROFILE_TOOLS_X} \ | ||
+ | ${PROFILE_TOOLS_SYSTEMD} \ | ||
+ | " | ||
+ | ... | ||
+ | PERF = "perf" | ||
+ | </syntaxhighlight> | ||
+ | ==安卓系统下使用== | ||
+ | :simpleperf<ref name="android">https://source.android.com/devices/tech/debug/eval_perf</ref>与perf等效,并且默认情况下已安装(/system/xbin/simpleperf),可以与所有适用于Android™的软件包一起使用 | ||
+ | :它支持较少的选项: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Board $> simpleperf --help | ||
+ | Usage: simpleperf [common options] subcommand [args_for_subcommand] | ||
+ | common options: | ||
+ | -h/--help Print this help information. | ||
+ | --log <severity> Set the minimum severity of logging. Possible severities | ||
+ | include verbose, debug, warning, info, error, fatal. | ||
+ | Default is info. | ||
+ | --version Print version of simpleperf. | ||
+ | subcommands: | ||
+ | debug-unwind Debug/test offline unwinding. | ||
+ | dump dump perf record file | ||
+ | help print help information for simpleperf | ||
+ | kmem collect kernel memory allocation information | ||
+ | list list available event types | ||
+ | record record sampling info in perf.data | ||
+ | report report sampling information in perf.data | ||
+ | report-sample report raw sample information in perf.data | ||
+ | stat gather performance counter information | ||
+ | </syntaxhighlight> | ||
=开始使用= | =开始使用= | ||
+ | Board $> which perf | ||
+ | /usr/bin/perf | ||
+ | |||
== Perf命令== | == Perf命令== | ||
− | perf report (Linux kernel documentation[10]): | + | <syntaxhighlight lang="bash"> |
+ | usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] | ||
+ | |||
+ | The most commonly used perf commands are: | ||
+ | annotate Reads perf.data (created by perf record) and displays annotated code | ||
+ | archive Creates archive with object files with build-ids found in perf.data file | ||
+ | bench General framework for benchmark suites | ||
+ | buildid-cache Manages build-id cache. | ||
+ | buildid-list Lists the buildids in a perf.data file | ||
+ | c2c Shared Data C2C/HITM Analyzer. | ||
+ | config Gets and sets variables in a configuration file. | ||
+ | data Data file related processing | ||
+ | diff Reads perf.data files and displays the differential profile | ||
+ | evlist Lists the event names in a perf.data file | ||
+ | ftrace simple wrapper for kernel's ftrace functionality | ||
+ | inject Filters to augment the events stream with additional information | ||
+ | kallsyms Searches running kernel for symbols | ||
+ | kmem Tool to trace/measure kernel memory properties | ||
+ | kvm Tool to trace/measure kvm guest os | ||
+ | list Lists all symbolic event types | ||
+ | lock Analyzes lock events | ||
+ | mem Profiles memory accesses | ||
+ | record Runs a command and records its profile into perf.data | ||
+ | report Reads perf.data (created by perf record) and displays the profile | ||
+ | sched Tool to trace/measure scheduler properties (latencies) | ||
+ | script Reads perf.data (created by perf record) and displays trace output | ||
+ | stat Runs a command and gathers performance counter statistics | ||
+ | test Runs sanity tests. | ||
+ | timechart Tool to visualize total system behavior during a workload | ||
+ | top System profiling tool. | ||
+ | probe Defines new dynamic tracepoints | ||
+ | |||
+ | See 'perf COMMAND -h' for more information on a specific command. | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | *perf top(Linux内核文档<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-top.txt</ref>):通过计算循环事件数来提供CPU负载;默认顺序是每个符号的采样数降序: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Board $> perf top | ||
+ | 40.62% [kernel] [k] v7_dma_inv_range | ||
+ | 18.65% [kernel] [k] _raw_spin_unlock_irqrestore | ||
+ | 17.01% [kernel] [k] arch_cpu_idle | ||
+ | 8.27% [kernel] [k] v7_dma_clean_range | ||
+ | 5.00% [kernel] [k] rcu_idle_exit | ||
+ | 1.70% [kernel] [k] cpu_startup_entry | ||
+ | 0.52% [kernel] [k] trace_graph_return | ||
+ | 0.48% [kernel] [k] finish_task_switch | ||
+ | 0.48% libc-2.18.so [.] memcpy | ||
+ | 0.47% [kernel] [k] trace_graph_entry | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | :意味着CPU在功能v7_dma_inv_range中花费了40%的时间,在_raw_spin_unlock_irqrestore中花费了18.65%的时间。 | ||
+ | :有关更多信息和示例,请访问perf.wiki.kernel.org<ref>https://perf.wiki.kernel.org/index.php/Tutorial#Live_analysis_with_perf_top</ref>。 | ||
+ | :也可以按指定的排序显示结果: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | sage: perf top [<options>] | ||
+ | -s, --sort <key[,key2...]> | ||
+ | sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer to the main page for the complete list. | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | *perf stat(linux kernel documentation<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-stat.txt</ref>): 获取事件计数 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Board $> perf stat hello_world_example | ||
+ | User space example: hello world from STMicroelectronics | ||
+ | 10 9 8 7 6 5 4 3 2 1 0 | ||
+ | User space example: goodbye from STMicroelectronics | ||
+ | </syntaxhighlight> | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Performance counter stats for 'hello_world_example': | ||
+ | |||
+ | 4.328249 task-clock (msec) # 0.000 CPUs utilized | ||
+ | 11 context-switches # 0.003 M/sec | ||
+ | 0 cpu-migrations # 0.000 K/sec | ||
+ | 38 page-faults # 0.009 M/sec | ||
+ | 2710036 cycles # 0.626 GHz | ||
+ | 640856 instructions # 0.24 insn per cycle | ||
+ | 75644 branches # 17.477 M/sec | ||
+ | 21764 branch-misses # 28.77% of all branches | ||
+ | |||
+ | 11.109859338 seconds time elapsed | ||
+ | </syntaxhighlight> | ||
+ | :有关更多信息和示例,请参见perf.wiki.kernel.org<ref>https://perf.wiki.kernel.org/index.php/Tutorial#Counting_with_perf_stat</ref> | ||
+ | |||
+ | *perf list (Linux kernel documentation<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-list.txt</ref>): 支持的符号事件类型. | ||
+ | <div style="height:30em; overflow:auto; border: 1px solid #088"> | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Board $> perf list | ||
+ | branch-instructions OR branches [Hardware event] | ||
+ | branch-misses [Hardware event] | ||
+ | bus-cycles [Hardware event] | ||
+ | cache-misses [Hardware event] | ||
+ | cache-references [Hardware event] | ||
+ | cpu-cycles OR cycles [Hardware event] | ||
+ | instructions [Hardware event] | ||
+ | alignment-faults [Software event] | ||
+ | bpf-output [Software event] | ||
+ | context-switches OR cs [Software event] | ||
+ | cpu-clock [Software event] | ||
+ | cpu-migrations OR migrations [Software event] | ||
+ | dummy [Software event] | ||
+ | emulation-faults [Software event] | ||
+ | major-faults [Software event] | ||
+ | minor-faults [Software event] | ||
+ | page-faults OR faults [Software event] | ||
+ | task-clock [Software event] | ||
+ | L1-dcache-load-misses [Hardware cache event] | ||
+ | L1-dcache-loads [Hardware cache event] | ||
+ | L1-dcache-store-misses [Hardware cache event] | ||
+ | L1-dcache-stores [Hardware cache event] | ||
+ | L1-icache-load-misses [Hardware cache event] | ||
+ | L1-icache-loads [Hardware cache event] | ||
+ | LLC-load-misses [Hardware cache event] | ||
+ | LLC-loads [Hardware cache event] | ||
+ | LLC-store-misses [Hardware cache event] | ||
+ | LLC-stores [Hardware cache event] | ||
+ | branch-load-misses [Hardware cache event] | ||
+ | branch-loads [Hardware cache event] | ||
+ | dTLB-load-misses [Hardware cache event] | ||
+ | dTLB-store-misses [Hardware cache event] | ||
+ | iTLB-load-misses [Hardware cache event] | ||
+ | armv7_cortex_a7/br_immed_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/br_mis_pred/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/br_pred/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/br_return_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/bus_access/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/bus_cycles/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/cid_write_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/cpu_cycles/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/exc_return/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/exc_taken/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/inst_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/inst_spec/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1d_cache/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1d_cache_refill/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1d_cache_wb/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1d_tlb_refill/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1i_cache/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1i_cache_refill/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l1i_tlb_refill/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l2d_cache/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l2d_cache_refill/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/l2d_cache_wb/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/ld_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/mem_access/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/memory_error/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/pc_write_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/st_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/sw_incr/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/ttbr_write_retired/ [Kernel PMU event] | ||
+ | armv7_cortex_a7/unaligned_ldst_retired/ [Kernel PMU event] | ||
+ | rNNN [Raw hardware event descriptor] | ||
+ | cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] | ||
+ | mem:<addr>[/len][:access] [Hardware breakpoint] | ||
+ | alarmtimer:alarmtimer_cancel [Tracepoint event] | ||
+ | alarmtimer:alarmtimer_fired [Tracepoint event] | ||
+ | alarmtimer:alarmtimer_start [Tracepoint event] | ||
+ | alarmtimer:alarmtimer_suspend [Tracepoint event] | ||
+ | asoc:snd_soc_bias_level_done [Tracepoint event] | ||
+ | asoc:snd_soc_bias_level_start [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_connected [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_done [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_path [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_start [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_walk_done [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_widget_event_done [Tracepoint event] | ||
+ | asoc:snd_soc_dapm_widget_event_start [Tracepoint event] | ||
+ | ... | ||
+ | xhci-hcd:xhci_inc_enq [Tracepoint event] | ||
+ | xhci-hcd:xhci_queue_trb [Tracepoint event] | ||
+ | xhci-hcd:xhci_ring_alloc [Tracepoint event] | ||
+ | xhci-hcd:xhci_ring_expansion [Tracepoint event] | ||
+ | xhci-hcd:xhci_ring_free [Tracepoint event] | ||
+ | xhci-hcd:xhci_setup_addressable_virt_device [Tracepoint event] | ||
+ | xhci-hcd:xhci_setup_device [Tracepoint event] | ||
+ | xhci-hcd:xhci_setup_device_slot [Tracepoint event] | ||
+ | xhci-hcd:xhci_stop_device [Tracepoint event] | ||
+ | xhci-hcd:xhci_urb_dequeue [Tracepoint event] | ||
+ | xhci-hcd:xhci_urb_enqueue [Tracepoint event] | ||
+ | xhci-hcd:xhci_urb_giveback [Tracepoint event] | ||
+ | </syntaxhighlight> | ||
+ | </div> | ||
+ | |||
+ | *perf record (Linux kernel documentation<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt</ref>): 记录事件以供以后报告 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | |||
+ | Board $> perf record hello_world_example | ||
+ | |||
+ | User space example: hello world from STMicroelectronics | ||
+ | 10 9 8 7 6 5 4 3 2 1 0 | ||
+ | User space example: goodbye from STMicroelectronics | ||
+ | [ perf record: Woken up 1 time to write data ] | ||
+ | [ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ] | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | :这可以过滤事件(由perf list命令提供)。有关更多信息,选项和示例,请参见perf.wiki.kernel.org<ref>https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record</ref>。 | ||
+ | :默认情况下,事件记录在perf.data文件中。如果要指定另一个输出文件名,则必须添加-o,--output <file>选项。 | ||
+ | |||
+ | *perf report (Linux kernel documentation<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt</ref>): 按进程,功能等细分事件。. | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Example after previous command "perf record hello_world_example" | ||
+ | Board $> perf report | ||
+ | Samples: 28 of event 'cycles:ppp', Event count (approx.):2737925 | ||
+ | Overhead Command Shared Object Symbol | ||
+ | 12.66% hello_world_exa ld-2.26.so [.] _dl_relocate_object | ||
+ | 11.71% hello_world_exa [kernel.kallsyms] [k] filemap_map_pages | ||
+ | 10.65% hello_world_exa [kernel.kallsyms] [k] n_tty_write | ||
+ | 6.43% hello_world_exa [kernel.kallsyms] [k] percpu_counter_add_batch | ||
+ | 6.43% hello_world_exa ld-2.26.so [.] sbrk | ||
+ | 6.24% hello_world_exa [kernel.kallsyms] [k] cpu_v7_set_pte_ext | ||
+ | 5.56% hello_world_exa [kernel.kallsyms] [k] alloc_set_pte | ||
+ | 5.56% hello_world_exa libc-2.26.so [.] __sbrk | ||
+ | 5.37% hello_world_exa [kernel.kallsyms] [k] __vma_link_file | ||
+ | 5.32% hello_world_exa [kernel.kallsyms] [k] __fput | ||
+ | 5.32% hello_world_exa [kernel.kallsyms] [k] ldsem_up_read | ||
+ | 5.32% hello_world_exa [kernel.kallsyms] [k] unmap_page_range | ||
+ | 5.32% hello_world_exa libc-2.26.so [.] printf | ||
+ | 5.24% hello_world_exa [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | ||
+ | 2.23% hello_world_exa [kernel.kallsyms] [k] perf_event_mmap | ||
+ | 0.48% hello_world_exa [kernel.kallsyms] [k] perf_output_begin | ||
+ | 0.13% perf [kernel.kallsyms] [k] perf_event_exec | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | 默认情况下,报告文件perf.data被读取为输入文件。如果要指定另一个输入文件名,则必须添加-i,--input <file> option。 | ||
+ | 有关更多信息和示例,请参见perf.wiki.kernel.org<ref>https://perf.wiki.kernel.org/index.php/Tutorial#Sample_analysis_with_perf_report</ref>。 | ||
+ | *perf bench (Linux kernel documentation<ref>https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt</ref>): 运行不同的内核微基准测试: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | # List of all available benchmark collections: | ||
+ | |||
+ | sched: Scheduler and IPC benchmarks | ||
+ | mem: Memory access benchmarks | ||
+ | futex: Futex stressing benchmarks | ||
+ | all: All benchmarks | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | Example of getting memcpy benchmark for 100MB: | ||
+ | Board $> perf bench mem memcpy --size 100MB | ||
+ | # Running 'mem/memcpy' benchmark: | ||
+ | # function 'default' (Default memcpy() provided by glibc) | ||
+ | # Copying 100MB bytes ... | ||
+ | |||
+ | 1.426138 GB/sec | ||
+ | </syntaxhighlight> | ||
+ | :有关更多信息和示例,请参见perf.wiki.kernel.org<ref>https://perf.wiki.kernel.org/index.php/Tutorial#Benchmarking_with_perf_bench</ref> | ||
+ | |||
=进阶使用= | =进阶使用= | ||
+ | 作为Flame Graphs [14 ]的一部分,可以可视化来自perf的痕迹 | ||
+ | :作为Flame Graphs<ref>http://www.brendangregg.com/flamegraphs.html</ref>的一部分, 用来可视化perf的trace数据分析. | ||
+ | |||
+ | [[File:Perf_002.png]] | ||
+ | :火焰图是使用火焰图工具套件生成<ref>https://github.com/brendangregg/FlameGraph</ref> | ||
+ | |||
+ | *在主机端安装Flame Graph工具套件。 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | PC $> cd <your_local_path> | ||
+ | PC $> git clone https://github.com/brendangregg/FlameGraph.git | ||
+ | PC $> cd FlameGraph | ||
+ | </syntaxhighlight> | ||
+ | *从perf tool生成火焰图 | ||
+ | :生成性能记录时,必须添加-g选项。 | ||
+ | |||
+ | 生成火焰图并查看示例: | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | - 在板上执行perf record命令 | ||
+ | Board $> perf record -a -g top | ||
+ | Board $> perf script > perf_top.out | ||
+ | |||
+ | - 复制perf_top.out到主机PC中(即在FlameGraph目录中) | ||
+ | - 使用stackcollapse-perf.pl脚本在主机PC端执行火焰图生成命令。 | ||
+ | PC $> ./stackcollapse-perf.pl perf_top.out > out.top_folded | ||
+ | |||
+ | - 使用flamegraph.pl渲染SVG(可缩放矢量图形)文件。 | ||
+ | PC $> ./flamegraph.pl out.top_folded > top.svg | ||
+ | |||
+ | - 例如使用网络浏览器查看SVG | ||
+ | PC $> firefox top.svg | ||
+ | |||
+ | </syntaxhighlight> | ||
+ | |||
=参考= | =参考= | ||
<references /> | <references /> | ||
[[Category:Linux_Operating_System]][[Category:Linux_monitoring_tools]][[Category:Perf]] | [[Category:Linux_Operating_System]][[Category:Linux_monitoring_tools]][[Category:Perf]] |
2019年12月12日 (四) 16:00的最新版本
简介
perf[1]是Linux用户空间工具,它可以获取系统性能数据
如何安装
手动编译安装
BuildRoot下安装
Yocto下安装
RRECOMMENDS_${PN} = "\
${PERF} \
trace-cmd \
blktrace \
${PROFILE_TOOLS_X} \
${PROFILE_TOOLS_SYSTEMD} \
"
...
PERF = "perf"
安卓系统下使用
- simpleperf[2]与perf等效,并且默认情况下已安装(/system/xbin/simpleperf),可以与所有适用于Android™的软件包一起使用
- 它支持较少的选项:
Board $> simpleperf --help
Usage: simpleperf [common options] subcommand [args_for_subcommand]
common options:
-h/--help Print this help information.
--log <severity> Set the minimum severity of logging. Possible severities
include verbose, debug, warning, info, error, fatal.
Default is info.
--version Print version of simpleperf.
subcommands:
debug-unwind Debug/test offline unwinding.
dump dump perf record file
help print help information for simpleperf
kmem collect kernel memory allocation information
list list available event types
record record sampling info in perf.data
report report sampling information in perf.data
report-sample report raw sample information in perf.data
stat gather performance counter information
开始使用
Board $> which perf /usr/bin/perf
Perf命令
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Reads perf.data (created by perf record) and displays annotated code
archive Creates archive with object files with build-ids found in perf.data file
bench General framework for benchmark suites
buildid-cache Manages build-id cache.
buildid-list Lists the buildids in a perf.data file
c2c Shared Data C2C/HITM Analyzer.
config Gets and sets variables in a configuration file.
data Data file related processing
diff Reads perf.data files and displays the differential profile
evlist Lists the event names in a perf.data file
ftrace simple wrapper for kernel's ftrace functionality
inject Filters to augment the events stream with additional information
kallsyms Searches running kernel for symbols
kmem Tool to trace/measure kernel memory properties
kvm Tool to trace/measure kvm guest os
list Lists all symbolic event types
lock Analyzes lock events
mem Profiles memory accesses
record Runs a command and records its profile into perf.data
report Reads perf.data (created by perf record) and displays the profile
sched Tool to trace/measure scheduler properties (latencies)
script Reads perf.data (created by perf record) and displays trace output
stat Runs a command and gathers performance counter statistics
test Runs sanity tests.
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
probe Defines new dynamic tracepoints
See 'perf COMMAND -h' for more information on a specific command.
- perf top(Linux内核文档[3]):通过计算循环事件数来提供CPU负载;默认顺序是每个符号的采样数降序:
Board $> perf top
40.62% [kernel] [k] v7_dma_inv_range
18.65% [kernel] [k] _raw_spin_unlock_irqrestore
17.01% [kernel] [k] arch_cpu_idle
8.27% [kernel] [k] v7_dma_clean_range
5.00% [kernel] [k] rcu_idle_exit
1.70% [kernel] [k] cpu_startup_entry
0.52% [kernel] [k] trace_graph_return
0.48% [kernel] [k] finish_task_switch
0.48% libc-2.18.so [.] memcpy
0.47% [kernel] [k] trace_graph_entry
- 意味着CPU在功能v7_dma_inv_range中花费了40%的时间,在_raw_spin_unlock_irqrestore中花费了18.65%的时间。
- 有关更多信息和示例,请访问perf.wiki.kernel.org[4]。
- 也可以按指定的排序显示结果:
sage: perf top [<options>]
-s, --sort <key[,key2...]>
sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer to the main page for the complete list.
- perf stat(linux kernel documentation[5]): 获取事件计数
Board $> perf stat hello_world_example
User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0
User space example: goodbye from STMicroelectronics
Performance counter stats for 'hello_world_example':
4.328249 task-clock (msec) # 0.000 CPUs utilized
11 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
38 page-faults # 0.009 M/sec
2710036 cycles # 0.626 GHz
640856 instructions # 0.24 insn per cycle
75644 branches # 17.477 M/sec
21764 branch-misses # 28.77% of all branches
11.109859338 seconds time elapsed
- 有关更多信息和示例,请参见perf.wiki.kernel.org[6]
- perf list (Linux kernel documentation[7]): 支持的符号事件类型.
Board $> perf list
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-store-misses [Hardware cache event]
iTLB-load-misses [Hardware cache event]
armv7_cortex_a7/br_immed_retired/ [Kernel PMU event]
armv7_cortex_a7/br_mis_pred/ [Kernel PMU event]
armv7_cortex_a7/br_pred/ [Kernel PMU event]
armv7_cortex_a7/br_return_retired/ [Kernel PMU event]
armv7_cortex_a7/bus_access/ [Kernel PMU event]
armv7_cortex_a7/bus_cycles/ [Kernel PMU event]
armv7_cortex_a7/cid_write_retired/ [Kernel PMU event]
armv7_cortex_a7/cpu_cycles/ [Kernel PMU event]
armv7_cortex_a7/exc_return/ [Kernel PMU event]
armv7_cortex_a7/exc_taken/ [Kernel PMU event]
armv7_cortex_a7/inst_retired/ [Kernel PMU event]
armv7_cortex_a7/inst_spec/ [Kernel PMU event]
armv7_cortex_a7/l1d_cache/ [Kernel PMU event]
armv7_cortex_a7/l1d_cache_refill/ [Kernel PMU event]
armv7_cortex_a7/l1d_cache_wb/ [Kernel PMU event]
armv7_cortex_a7/l1d_tlb_refill/ [Kernel PMU event]
armv7_cortex_a7/l1i_cache/ [Kernel PMU event]
armv7_cortex_a7/l1i_cache_refill/ [Kernel PMU event]
armv7_cortex_a7/l1i_tlb_refill/ [Kernel PMU event]
armv7_cortex_a7/l2d_cache/ [Kernel PMU event]
armv7_cortex_a7/l2d_cache_refill/ [Kernel PMU event]
armv7_cortex_a7/l2d_cache_wb/ [Kernel PMU event]
armv7_cortex_a7/ld_retired/ [Kernel PMU event]
armv7_cortex_a7/mem_access/ [Kernel PMU event]
armv7_cortex_a7/memory_error/ [Kernel PMU event]
armv7_cortex_a7/pc_write_retired/ [Kernel PMU event]
armv7_cortex_a7/st_retired/ [Kernel PMU event]
armv7_cortex_a7/sw_incr/ [Kernel PMU event]
armv7_cortex_a7/ttbr_write_retired/ [Kernel PMU event]
armv7_cortex_a7/unaligned_ldst_retired/ [Kernel PMU event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
mem:<addr>[/len][:access] [Hardware breakpoint]
alarmtimer:alarmtimer_cancel [Tracepoint event]
alarmtimer:alarmtimer_fired [Tracepoint event]
alarmtimer:alarmtimer_start [Tracepoint event]
alarmtimer:alarmtimer_suspend [Tracepoint event]
asoc:snd_soc_bias_level_done [Tracepoint event]
asoc:snd_soc_bias_level_start [Tracepoint event]
asoc:snd_soc_dapm_connected [Tracepoint event]
asoc:snd_soc_dapm_done [Tracepoint event]
asoc:snd_soc_dapm_path [Tracepoint event]
asoc:snd_soc_dapm_start [Tracepoint event]
asoc:snd_soc_dapm_walk_done [Tracepoint event]
asoc:snd_soc_dapm_widget_event_done [Tracepoint event]
asoc:snd_soc_dapm_widget_event_start [Tracepoint event]
...
xhci-hcd:xhci_inc_enq [Tracepoint event]
xhci-hcd:xhci_queue_trb [Tracepoint event]
xhci-hcd:xhci_ring_alloc [Tracepoint event]
xhci-hcd:xhci_ring_expansion [Tracepoint event]
xhci-hcd:xhci_ring_free [Tracepoint event]
xhci-hcd:xhci_setup_addressable_virt_device [Tracepoint event]
xhci-hcd:xhci_setup_device [Tracepoint event]
xhci-hcd:xhci_setup_device_slot [Tracepoint event]
xhci-hcd:xhci_stop_device [Tracepoint event]
xhci-hcd:xhci_urb_dequeue [Tracepoint event]
xhci-hcd:xhci_urb_enqueue [Tracepoint event]
xhci-hcd:xhci_urb_giveback [Tracepoint event]
- perf record (Linux kernel documentation[8]): 记录事件以供以后报告
Board $> perf record hello_world_example
User space example: hello world from STMicroelectronics
10 9 8 7 6 5 4 3 2 1 0
User space example: goodbye from STMicroelectronics
[ perf record: Woken up 1 time to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ]
- 这可以过滤事件(由perf list命令提供)。有关更多信息,选项和示例,请参见perf.wiki.kernel.org[9]。
- 默认情况下,事件记录在perf.data文件中。如果要指定另一个输出文件名,则必须添加-o,--output <file>选项。
- perf report (Linux kernel documentation[10]): 按进程,功能等细分事件。.
Example after previous command "perf record hello_world_example"
Board $> perf report
Samples: 28 of event 'cycles:ppp', Event count (approx.):2737925
Overhead Command Shared Object Symbol
12.66% hello_world_exa ld-2.26.so [.] _dl_relocate_object
11.71% hello_world_exa [kernel.kallsyms] [k] filemap_map_pages
10.65% hello_world_exa [kernel.kallsyms] [k] n_tty_write
6.43% hello_world_exa [kernel.kallsyms] [k] percpu_counter_add_batch
6.43% hello_world_exa ld-2.26.so [.] sbrk
6.24% hello_world_exa [kernel.kallsyms] [k] cpu_v7_set_pte_ext
5.56% hello_world_exa [kernel.kallsyms] [k] alloc_set_pte
5.56% hello_world_exa libc-2.26.so [.] __sbrk
5.37% hello_world_exa [kernel.kallsyms] [k] __vma_link_file
5.32% hello_world_exa [kernel.kallsyms] [k] __fput
5.32% hello_world_exa [kernel.kallsyms] [k] ldsem_up_read
5.32% hello_world_exa [kernel.kallsyms] [k] unmap_page_range
5.32% hello_world_exa libc-2.26.so [.] printf
5.24% hello_world_exa [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.23% hello_world_exa [kernel.kallsyms] [k] perf_event_mmap
0.48% hello_world_exa [kernel.kallsyms] [k] perf_output_begin
0.13% perf [kernel.kallsyms] [k] perf_event_exec
默认情况下,报告文件perf.data被读取为输入文件。如果要指定另一个输入文件名,则必须添加-i,--input <file> option。 有关更多信息和示例,请参见perf.wiki.kernel.org[11]。
- perf bench (Linux kernel documentation[12]): 运行不同的内核微基准测试:
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
futex: Futex stressing benchmarks
all: All benchmarks
Example of getting memcpy benchmark for 100MB:
Board $> perf bench mem memcpy --size 100MB
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 100MB bytes ...
1.426138 GB/sec
- 有关更多信息和示例,请参见perf.wiki.kernel.org[13]
进阶使用
作为Flame Graphs [14 ]的一部分,可以可视化来自perf的痕迹
- 作为Flame Graphs[14]的一部分, 用来可视化perf的trace数据分析.
- 火焰图是使用火焰图工具套件生成[15]
- 在主机端安装Flame Graph工具套件。
PC $> cd <your_local_path>
PC $> git clone https://github.com/brendangregg/FlameGraph.git
PC $> cd FlameGraph
- 从perf tool生成火焰图
- 生成性能记录时,必须添加-g选项。
生成火焰图并查看示例:
- 在板上执行perf record命令
Board $> perf record -a -g top
Board $> perf script > perf_top.out
- 复制perf_top.out到主机PC中(即在FlameGraph目录中)
- 使用stackcollapse-perf.pl脚本在主机PC端执行火焰图生成命令。
PC $> ./stackcollapse-perf.pl perf_top.out > out.top_folded
- 使用flamegraph.pl渲染SVG(可缩放矢量图形)文件。
PC $> ./flamegraph.pl out.top_folded > top.svg
- 例如使用网络浏览器查看SVG
PC $> firefox top.svg
参考
- ↑ https://perf.wiki.kernel.org/index.php/Main_Page
- ↑ https://source.android.com/devices/tech/debug/eval_perf
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-top.txt
- ↑ https://perf.wiki.kernel.org/index.php/Tutorial#Live_analysis_with_perf_top
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-stat.txt
- ↑ https://perf.wiki.kernel.org/index.php/Tutorial#Counting_with_perf_stat
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-list.txt
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt
- ↑ https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt
- ↑ https://perf.wiki.kernel.org/index.php/Tutorial#Sample_analysis_with_perf_report
- ↑ https://github.com/STMicroelectronics/linux/blob/v4.19-stm32mp/tools/perf/Documentation/perf-record.txt
- ↑ https://perf.wiki.kernel.org/index.php/Tutorial#Benchmarking_with_perf_bench
- ↑ http://www.brendangregg.com/flamegraphs.html
- ↑ https://github.com/brendangregg/FlameGraph