1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
//! Heuristics for correcting instruction pointers based on the CPU architecture.
use crate;
const SIGILL: u32 = 4;
const SIGBUS: u32 = 10;
const SIGSEGV: u32 = 11;
/// Helper to work with instruction addresses.
///
/// Directly symbolicated stack traces may show the wrong calling symbols, as the stack frame's
/// return addresses point a few bytes past the original call site, which may place the address
/// within a different symbol entirely.
///
/// The most useful function is [`caller_address`], which applies some heuristics to determine the
/// call site of a function call based on the return address.
///
/// # Examples
///
/// ```
/// use symbolic_common::{Arch, InstructionInfo};
///
/// const SIGSEGV: u32 = 11;
///
/// let caller_address = InstructionInfo::new(Arch::Arm64, 0x1337)
/// .is_crashing_frame(false)
/// .signal(Some(SIGSEGV))
/// .ip_register_value(Some(0x4242))
/// .caller_address();
///
/// assert_eq!(caller_address, 0x1330);
/// ```
///
/// # Background
///
/// When *calling* a function, it is necessary for the *called* function to know where it should
/// return to upon completion. To support this, a *return address* is supplied as part of the
/// standard function call semantics. This return address specifies the instruction that the called
/// function should jump to upon completion of its execution.
///
/// When a crash reporter generates a backtrace, it first collects the thread state of all active
/// threads, including the **actual** current execution address. The reporter then iterates over
/// those threads, walking backwards to find calling frames – what it's actually finding during this
/// process are the **return addresses**. The actual address of the call instruction is not recorded
/// anywhere. The only address available is the address at which execution should resume after
/// function return.
///
/// To make things more complicated, there is no guarantee that a return address be set to exactly
/// one instruction after the call. It's entirely proper for a function to remove itself from the
/// call stack by setting a different return address entirely. This is why you never see
/// `objc_msgSend` in your backtrace unless you actually crash inside of `objc_msgSend`. When
/// `objc_msgSend` jumps to a method's implementation, it leaves its caller's return address in
/// place, and `objc_msgSend` itself disappears from the stack trace. In the case of `objc_msgSend`,
/// the loss of that information is of no great importance, but it's hardly the only function that
/// elides its own code from the return address.
///
/// # Heuristics
///
/// To resolve this particular issue, it is necessary for the symbolication implementor to apply a
/// per-architecture heuristics to the return addresses, and thus derive the **likely** address of
/// the actual calling instruction. There is a high probability of correctness, but absolutely no
/// guarantee.
///
/// This derived address **should** be used as the symbolication address, but **should not** replace
/// the return address in the crash report. This derived address is a best guess, and if you replace
/// the return address in the report, the end-user will have lost access to the original canonical
/// data from which they could have made their own assessment.
///
/// These heuristics must not be applied to frame #0 on any thread. The first frame of all threads
/// contains the actual register state of that thread at the time that it crashed (if it's the
/// crashing thread), or at the time it was suspended (if it is a non-crashing thread). These
/// heuristics should only be applied to frames *after* frame #0 – that is, starting with frame #1.
///
/// Additionally, these heuristics assume that your symbolication implementation correctly handles
/// addresses that occur within an instruction, rather than directly at the start of a valid
/// instruction. This should be the case for any reasonable implementation, but is something to be
/// aware of when deploying these changes.
///
/// ## x86 and x86-64
///
/// x86 uses variable-width instruction encodings; subtract one byte from the return address to
/// derive an address that should be within the calling instruction. This will provide an address
/// within a calling instruction found directly prior to the return address.
///
/// ## ARMv6 and ARMv7
///
/// - **Step 1:** Strip the low order thumb bit from the return address. ARM uses the low bit to
/// inform the processor that it should enter thumb mode when jumping to the return address. Since
/// all instructions are at least 2 byte aligned, an actual instruction address will never have
/// the low bit set.
///
/// - **Step 2:** Subtract 2 Bytes. 32-bit ARM instructions are either 2 or 4 bytes long, depending
/// on the use of thumb. This will place the symbolication address within the likely calling
/// instruction. All ARM64 instructions are 4 bytes long; subtract 4 bytes from the return address
/// to derive the likely address of the calling instruction.
///
/// # More Information
///
/// The above information was taken and slightly updated from the now-gone *PLCrashReporter Wiki*.
/// An old copy can still be found in the [internet archive].
///
/// [internet archive]: https://web.archive.org/web/20161012225323/https://opensource.plausible.coop/wiki/display/PLCR/Automated+Crash+Report+Analysis
/// [`caller_address`]: struct.InstructionInfo.html#method.caller_address