inferno/lib.rs
1//! Inferno is a set of tools that let you to produce [flame graphs] from performance profiles of
2//! your application. It's a port of parts Brendan Gregg's original [flamegraph toolkit] that aims
3//! to improve the performance of the original flamegraph tools and provide programmatic access to
4//! them to facilitate integration with _other_ tools (like [not-perf]).
5//!
6//! Inferno, like the original flame graph toolkit, consists of two "stages": stack collapsing and
7//! plotting. In the original Perl implementations, these were represented by the `stackcollapse-*`
8//! binaries and `flamegraph.pl` respectively. In Inferno, collapsing is available through the
9//! [`collapse`] module and the `inferno-collapse-*` binaries, and plotting can be found in the
10//! [`flamegraph`] module and the `inferno-flamegraph` binary.
11//!
12//! # Command-line use
13//!
14//! ## Collapsing stacks
15//!
16//! Most sampling profilers (as opposed to [tracing profilers]) work by repeatedly recording the
17//! state of the [call stack]. The stack can be sampled based on a fixed sampling interval, based
18//! on [hardware or software events], or some combination of the two. In the end, you get a series
19//! of [stack traces], each of which represents a snapshot of where the program was at different
20//! points in time.
21//!
22//! Given enough of these snapshots, you can get a pretty good idea of where your program is
23//! spending its time by looking at which functions appear in many of the traces. To ease this
24//! analysis, we want to "collapse" the stack traces so if a particular trace occurs more than
25//! once, we instead just keep it _once_ along with a count of how many times we've seen it. This
26//! is what the various collapsing tools do! You'll sometimes see the resulting tuples of stack +
27//! count called a "folded stack trace".
28//!
29//! Since profiling tools produce stack traces in a myriad of different formats, and the flame
30//! graph plotter expects input in a particular folded stack trace format, each profiler needs a
31//! separate collapse implementation. While the original Perl implementation supports _lots_ of
32//! profilers, Inferno currently only supports four: the widely used [`perf`] tool (specifically
33//! the output from `perf script`), [DTrace], [sample], and [VTune]. Support for xdebug is
34//! [hopefully coming soon], and [`bpftrace`] should get [native support] before too long.
35//!
36//! Inferno supports profiles from applications written in any language, but we'll walk through an
37//! example with a Rust program. To profile a Rust application, you would first set
38//!
39//! ```toml
40//! [profile.release]
41//! debug = true
42//! ```
43//!
44//! in your `Cargo.toml` so that your profile will have useful function names and such included.
45//! Then, compile with `--release`, and then run your favorite performance profiler:
46//!
47//! ### perf (Linux)
48//!
49//! ```console
50//! # perf record --call-graph dwarf target/release/mybin
51//! $ perf script | inferno-collapse-perf > stacks.folded
52//! ```
53//!
54//! For more advanced uses, see Brendan Gregg's excellent [perf examples] page.
55//!
56//! Note: For larger binaries (like Firefox), the perf script can be significantly slowed down
57//! by a non-optimal performance of the addr2line tool. Starting from perf version 6.12, you can
58//! use an alternative addr2line tool (by using `perf script --addr2line=/path/to/addr2line`),
59//! where the recommended one would be the Rust implementation from [Gimli project].
60//!
61//! ### DTrace (macOS)
62//!
63//! ```console
64//! $ target/release/mybin &
65//! $ pid=$!
66//! # dtrace -x ustackframes=100 -n "profile-97 /pid == $pid/ { @[ustack()] = count(); } tick-60s { exit(0); }" -o out.user_stacks
67//! $ cat out.user_stacks | inferno-collapse-dtrace > stacks.folded
68//! ```
69//!
70//! For more advanced uses, see also upstream FlameGraph's [DTrace examples].
71//! You may also be interested in something like [NodeJS's ustack helper].
72//!
73//! ### sample (macOS)
74//!
75//! ```console
76//! $ target/release/mybin &
77//! $ pid=$!
78//! $ sample $pid 30 -file sample.txt
79//! $ inferno-collapse-sample sample.txt > stacks.folded
80//! ```
81//!
82//! ### VTune (Windows and Linux)
83//!
84//! ```console
85//! $ amplxe-cl -collect hotspots -r resultdir -- target/release/mybin
86//! $ amplxe-cl -R top-down -call-stack-mode all -column=\"CPU Time:Self\",\"Module\" -report-out result.csv -filter \"Function Stack\" -format csv -csv-delimiter comma -r resultdir
87//! $ inferno-collapse-vtune result.csv > stacks.folded
88//! ```
89//!
90//! ## Producing a flame graph
91//!
92//! Once you have a folded stack file, you're ready to produce the flame graph SVG image. To do so,
93//! simply provide the folded stack file to `inferno-flamegraph`, and it will print the resulting
94//! SVG. Following on from the example above:
95//!
96//! ```console
97//! $ cat stacks.folded | inferno-flamegraph > profile.svg
98//! ```
99//!
100//! And then open `profile.svg` in your viewer of choice.
101//!
102//! ## Differential flame graphs
103//!
104//! You can debug CPU performance regressions with the help of differential flame graphs.
105//! They let you easily visualize the differences between two profiles performed before and
106//! after a code change. See Brendan Gregg's [differential flame graphs] blog post for a great
107//! writeup. To create one you must first pass the two folded stack files to `inferno-diff-folded`,
108//! then send the output to `inferno-flamegraph`. Example:
109//!
110//! ```console
111//! $ inferno-diff-folded folded1 folded2 | inferno-flamegraph > diff2.svg
112//! ```
113//!
114//! The flamegraph will be colored based on higher samples (red) and smaller samples (blue). The
115//! frame widths will be based on the 2nd folded profile. This might be confusing if stack frames
116//! disappear entirely; it will make the most sense to ALSO create a differential based on the 1st
117//! profile widths, while switching the hues. To do this, reverse the order of the input files
118//! and pass the `--negate` flag to `inferno-flamegraph` like this:
119//!
120//! ```console
121//! $ inferno-diff-folded folded2 folded1 | inferno-flamegraph --negate > diff1.svg
122//! ```
123//!
124//! # Feature flags
125//! All features below are enabled by default
126//! - `cli`: Also builds the `inferno` command-line tools
127//! - `multithreaded`: Enables multithreaded stack-collapsing
128//! - `nameattr`: Allows for adding customizing and adding attributes to the svg of [`flamegraph`]. See the `--nameattr` option for the flamegraph cli
129//!
130//! # Development
131//!
132//! This crate was initially developed through [a series of live coding sessions]. If you want to
133//! contribute to the code, that may be a good way to learn why it's all designed the way it is!
134//!
135//! [flame graphs]: http://www.brendangregg.com/flamegraphs.html
136//! [flamegraph toolkit]: https://github.com/brendangregg/FlameGraph
137//! [not-perf]: https://github.com/nokia/not-perf
138//! [tracing profilers]: https://danluu.com/perf-tracing/
139//! [call stack]: https://en.wikipedia.org/wiki/Call_stack
140//! [hardware or software events]: https://perf.wiki.kernel.org/index.php/Tutorial#Events
141//! [stack traces]: https://en.wikipedia.org/wiki/Stack_trace
142//! [`perf`]: https://perf.wiki.kernel.org/index.php/Main_Page
143//! [DTrace]: https://www.joyent.com/dtrace
144//! [hopefully coming soon]: https://twitter.com/DanielLockyer/status/1094605231155900416
145//! [native support]: https://github.com/jonhoo/inferno/issues/51#issuecomment-466732304
146//! [`bpftrace`]: https://github.com/iovisor/bpftrace
147//! [perf examples]: http://www.brendangregg.com/perf.html
148//! [DTrace examples]: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#DTrace
149//! [NodeJS's ustack helper]: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/
150//! [a series of live coding sessions]: https://www.youtube.com/watch?v=jTpK-bNZiA4&list=PLqbS7AVVErFimAvMW-kIJUwxpPvcPBCsz
151//! [differential flame graphs]: http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
152//! [sample]: https://gist.github.com/loderunner/36724cc9ee8db66db305#profiling-with-sample
153//! [VTune]: https://software.intel.com/en-us/vtune-amplifier-help-command-line-interface
154//! [gimli project]: https://github.com/gimli-rs/addr2line
155
156#![cfg_attr(doc, warn(rustdoc::all))]
157#![cfg_attr(doc, allow(rustdoc::missing_doc_code_examples))]
158#![deny(missing_docs)]
159#![warn(unreachable_pub)]
160#![allow(clippy::disallowed_names)]
161
162/// Stack collapsing for various input formats.
163///
164/// See the [crate-level documentation] for details.
165///
166/// [crate-level documentation]: ../index.html
167pub mod collapse;
168
169/// Tool for creating an output required to generate differential flame graphs.
170///
171/// See the [crate-level documentation] for details.
172///
173/// [crate-level documentation]: ../index.html
174pub mod differential;
175
176/// Tools for producing flame graphs from folded stack traces.
177///
178/// See the [crate-level documentation] for details.
179///
180/// [crate-level documentation]: ../index.html
181pub mod flamegraph;