Skip to content

Commit d906ff2

Browse files
authored
fix(stackwalking): Use debug info to inform scanning (#1913)
In an effort to reduce the number of "hallucinated" stack frames, #1651 introduced a sanity check for stack scanning into the minidump processor: if we have CFI for a module and the stack scanner would produce a frame that is not covered by that CFI, the frame is discarded. However, this made the processor too strict—if the CFI for the module is poor or incomplete, this may lead to us discarding valid frames. With this PR, we now _additionally_ use debug information in the form of symcaches to inform stack scanning. The new logic is: if we have both CFI and debug info for a module, but neither covers the potential frame's instruction address, then the frame is treated as invalid. Obviously this involves pulling symcache computation into minidump stackwalking, which previously didn't need to know about symcaches or debug info at all. Performance-wise I believe this to be no problem—symcache computation is cached and would have happened during the symbolication step anyway, now it just happens earlier. But it does leave us in a somewhat awkward spot where we compute debug info during stackwalking but then don't actually fill it in. We may want to rethink this in the future. Fixes INGEST-838.
1 parent 75a6fef commit d906ff2

File tree

9 files changed

+2693
-46
lines changed

9 files changed

+2693
-46
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## Unreleased
4+
5+
### Bug Fixes
6+
7+
- fix(stackwalking): Use debug info to inform scanning by @loewenheim in [#1905](https://github.com/getsentry/symbolicator/pull/1905)
8+
39
## 26.3.1
410

511
### New Features ✨

crates/symbolicator-native/src/symbolication/process_minidump.rs

Lines changed: 132 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ use symbolicator_sources::{ObjectId, ObjectType, SourceConfig};
2424
use tokio::sync::Notify;
2525

2626
use crate::caches::cficaches::{CfiCacheActor, CfiModuleInfo, FetchCfiCache, FetchedCfiCache};
27+
use crate::caches::derived::DerivedCache;
28+
use crate::caches::symcaches::{FetchSymCache, OwnedSymCache, SymCacheActor};
2729
use crate::interface::{
2830
CompleteObjectInfo, CompletedSymbolicationResponse, ProcessMinidump, RawFrame, RawStacktrace,
2931
Registers, RewriteRules, SymbolicateStacktraces, SystemInfo,
@@ -156,6 +158,13 @@ struct SymbolicatorSymbolProvider {
156158
object_type: ObjectType,
157159
/// The actor used for fetching CFI.
158160
cficache_actor: CfiCacheActor,
161+
/// The actor used for fetching symcaches.
162+
///
163+
/// Note that debug information is currently _not_ used to symbolicate frames
164+
/// during stackwalking (which `rust-minidump` supports in principle). Rather,
165+
/// we use it as a sanity check for `rust-minidump`'s stackwalker. See the
166+
/// implementation of `fill_symbol` for details.
167+
symcache_actor: SymCacheActor,
159168
/// The debug ID of the first module in the minidump
160169
/// (by address).
161170
///
@@ -177,6 +186,7 @@ impl SymbolicatorSymbolProvider {
177186
scope: Scope,
178187
sources: Arc<[SourceConfig]>,
179188
cficache_actor: CfiCacheActor,
189+
symcache_actor: SymCacheActor,
180190
object_type: ObjectType,
181191
first_module_debug_id: Option<DebugId>,
182192
rewrite_first_module: RewriteRules,
@@ -185,13 +195,37 @@ impl SymbolicatorSymbolProvider {
185195
scope,
186196
sources,
187197
cficache_actor,
198+
symcache_actor,
188199
object_type,
189200
first_module_debug_id,
190201
rewrite_first_module,
191202
cficaches: Default::default(),
192203
}
193204
}
194205

206+
fn object_id_from_module(&self, module: &dyn Module) -> ObjectId {
207+
let code_file = non_empty_file_name(&module.code_file());
208+
let mut debug_file = module.debug_file().as_deref().and_then(non_empty_file_name);
209+
210+
// Rewrite the first module's debug file according to the configured rewrite rules.
211+
if module.debug_identifier() == self.first_module_debug_id
212+
&& let Some(new_debug_file) = debug_file
213+
.as_ref()
214+
.and_then(|debug_file| self.rewrite_first_module.rewrite(debug_file))
215+
{
216+
debug_file = Some(new_debug_file);
217+
}
218+
219+
ObjectId {
220+
code_id: module.code_identifier(),
221+
code_file,
222+
debug_id: module.debug_identifier(),
223+
debug_file,
224+
debug_checksum: None,
225+
object_type: self.object_type,
226+
}
227+
}
228+
195229
/// Fetches CFI for the given module, parses it into a `SymbolFile`, and stores it internally.
196230
async fn load_cfi_module(&self, module: &(dyn Module + Sync)) -> Arc<FetchedCfiCache> {
197231
let key = LookupKey::new(module);
@@ -220,27 +254,7 @@ impl SymbolicatorSymbolProvider {
220254

221255
let sources = self.sources.clone();
222256
let scope = self.scope.clone();
223-
224-
let code_file = non_empty_file_name(&module.code_file());
225-
let mut debug_file = module.debug_file().as_deref().and_then(non_empty_file_name);
226-
227-
// Rewrite the first module's debug file according to the configured rewrite rules.
228-
if module.debug_identifier() == self.first_module_debug_id
229-
&& let Some(new_debug_file) = debug_file
230-
.as_ref()
231-
.and_then(|debug_file| self.rewrite_first_module.rewrite(debug_file))
232-
{
233-
debug_file = Some(new_debug_file);
234-
}
235-
236-
let identifier = ObjectId {
237-
code_id: module.code_identifier(),
238-
code_file,
239-
debug_id: module.debug_identifier(),
240-
debug_file,
241-
debug_checksum: None,
242-
object_type: self.object_type,
243-
};
257+
let identifier = self.object_id_from_module(module);
244258

245259
let cficache = self
246260
.cficache_actor
@@ -266,51 +280,121 @@ impl SymbolicatorSymbolProvider {
266280
}
267281
cficache
268282
}
269-
}
270283

271-
#[async_trait]
272-
impl SymbolProvider for SymbolicatorSymbolProvider {
273-
async fn fill_symbol(
284+
/// Fetches debug info for the given module and parses it into a `SymCache`.
285+
async fn load_symcache(&self, module: &(dyn Module + Sync)) -> DerivedCache<OwnedSymCache> {
286+
let sources = self.sources.clone();
287+
let scope = self.scope.clone();
288+
let identifier = self.object_id_from_module(module);
289+
290+
self.symcache_actor
291+
.fetch(FetchSymCache {
292+
object_type: self.object_type,
293+
identifier,
294+
sources,
295+
scope,
296+
})
297+
// NOTE: this `bind_hub` is important!
298+
// `load_symcache` is being called concurrently from `rust-minidump` via
299+
// `join_all`. We do need proper isolation of any async task that might
300+
// manipulate any Sentry scope.
301+
.bind_hub(Hub::new_from_top(Hub::current()))
302+
.await
303+
}
304+
305+
/// Try to validate the given frame's instruction address against the CFI for the module
306+
/// (if we have it).
307+
///
308+
/// Returns `false` if we have CFI for the module, but that CFI doesn't cover the frame's
309+
/// instruction address. Returns `true` in any other case.
310+
///
311+
/// This helps the `rust-minidump`'s stack-walker to make better decisions.
312+
/// If the successfully loaded CFI unwind info can not find the potential instruction
313+
/// the passed instruction is most likely not a valid instruction from this module.
314+
/// But since the instruction maps into the range of this module, we know it cannot
315+
/// be part of a different module either -> it's not a valid instruction.
316+
///
317+
/// See: <https://github.com/rust-minidump/rust-minidump/blob/32e01a0e54d987025aa64486ebc4154f7f6b16d2/minidump-unwind/src/lib.rs#L865-L877>
318+
async fn check_frame_by_cfi(
274319
&self,
275320
module: &(dyn Module + Sync),
276321
frame: &mut (dyn FrameSymbolizer + Send),
277-
) -> Result<(), FillSymbolError> {
278-
// This function is being called for every context frame,
322+
) -> bool {
323+
// This function is being called (through `fill_symbol`) for every context frame,
279324
// regardless if any stack walking happens.
280325
// In contrast, `walk_frame` below will be skipped in case the minidump
281326
// does not contain any actionable stack memory that would allow stack walking.
282327
// Loading this module here means we will be able to backfill a possibly missing
283328
// debug_id/file.
284329
let cfi_module = self.load_cfi_module(module).await;
285330

286-
// Try to validate the given instruction against the CFI module contents.
287-
//
288-
// This helps the `rust-minidump`'s stack-walker to make better decisions.
289-
// Returning an error here will aggressively create frames.
290-
// To indicate a that a potential instruction is not valid, we can return success
291-
// here but also not fill in the symbol name.
292-
//
293-
// If the successfully loaded CFI unwind info can not find the potential instruction
294-
// the passed instruction is most likely not a valid instruction from this module.
295-
// But since the instruction maps into the range of this module, we know it cannot
296-
// be part of a different module either -> it's not a valid instruction.
297-
// Returning success in this case, means the frame will be skipped.
298-
//
299-
// See: https://github.com/rust-minidump/rust-minidump/blob/32e01a0e54d987025aa64486ebc4154f7f6b16d2/minidump-unwind/src/lib.rs#L865-L877
300331
if let Ok(Some(cached)) = &cfi_module.cache {
301332
let cfi = &cached.0.cfi_stack_info;
302333
let instruction = frame.get_instruction() - module.base_address();
303334

304335
if !cfi.is_empty() && cfi.get(instruction).is_none() {
305336
// We definitely do have CFI information loaded, but the given instruction does not
306337
// map to any stack frame info.
307-
return Ok(());
338+
return false;
308339
}
309340
}
310341

311-
// In any other case, always return an error here to signal that we have no useful symbol information to
312-
// contribute. Doing nothing would stop the stack scanning prematurely.
313-
Err(FillSymbolError {})
342+
true
343+
}
344+
345+
/// Try to validate the given frame's instruction address against the debug information
346+
/// for the module (if we have it).
347+
///
348+
/// Returns `false` if we have debug info for the module, but that debug info doesn't cover the frame's
349+
/// instruction address. Returns `true` in any other case.
350+
///
351+
/// This helps the `rust-minidump`'s stack-walker to make better decisions.
352+
/// If the successfully loaded symcache can not find the potential instruction
353+
/// the passed instruction is most likely not a valid instruction from this module.
354+
/// But since the instruction maps into the range of this module, we know it cannot
355+
/// be part of a different module either -> it's not a valid instruction.
356+
///
357+
/// See: <https://github.com/rust-minidump/rust-minidump/blob/32e01a0e54d987025aa64486ebc4154f7f6b16d2/minidump-unwind/src/lib.rs#L865-L877>
358+
async fn check_frame_by_symbol_info(
359+
&self,
360+
module: &(dyn Module + Sync),
361+
frame: &mut (dyn FrameSymbolizer + Send),
362+
) -> bool {
363+
let symcache = self.load_symcache(module).await;
364+
365+
if let Ok(cached) = &symcache.cache {
366+
let instruction = frame.get_instruction() - module.base_address();
367+
368+
if cached.get().lookup(instruction).next().is_none() {
369+
// We definitely do have symbol information loaded, but the given instruction does not
370+
// map to any symbol info.
371+
return false;
372+
}
373+
}
374+
375+
true
376+
}
377+
}
378+
379+
#[async_trait]
380+
impl SymbolProvider for SymbolicatorSymbolProvider {
381+
async fn fill_symbol(
382+
&self,
383+
module: &(dyn Module + Sync),
384+
frame: &mut (dyn FrameSymbolizer + Send),
385+
) -> Result<(), FillSymbolError> {
386+
if self.check_frame_by_cfi(module, frame).await
387+
|| self.check_frame_by_symbol_info(module, frame).await
388+
{
389+
// If either CFI or symbol info does not reject the frame, return an error here to signal that
390+
// we have no useful symbol information to contribute. Doing nothing would stop the stack scanning prematurely.
391+
Err(FillSymbolError {})
392+
} else {
393+
// To indicate a that a potential instruction is not valid, we can return success
394+
// here but also not fill in the symbol name. This will cause `rust-minidump`'s stack
395+
// scanner to disregard the frame.
396+
Ok(())
397+
}
314398
}
315399

316400
async fn walk_frame(
@@ -359,6 +443,7 @@ fn object_info_from_minidump_module(ty: ObjectType, module: &MinidumpModule) ->
359443

360444
async fn stackwalk(
361445
cficaches: CfiCacheActor,
446+
symcaches: SymCacheActor,
362447
minidump: &Minidump,
363448
scope: Scope,
364449
sources: Arc<[SourceConfig]>,
@@ -387,6 +472,7 @@ async fn stackwalk(
387472
scope,
388473
sources,
389474
cficaches,
475+
symcaches,
390476
ty,
391477
first_module_debug_id,
392478
rewrite_first_module,
@@ -540,6 +626,7 @@ impl SymbolicationActor {
540626
duration,
541627
} = stackwalk(
542628
self.cficaches.clone(),
629+
self.symcaches.clone(),
543630
&minidump,
544631
scope.clone(),
545632
sources.clone(),

crates/symbolicator-native/src/symbolication/symbolicate.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ use super::native::{get_relative_caller_addr, symbolicate_native_frame};
3232
pub struct SymbolicationActor {
3333
demangle_cache: DemangleCache,
3434
pub(crate) objects: ObjectsActor,
35-
symcaches: SymCacheActor,
35+
pub(crate) symcaches: SymCacheActor,
3636
pub(crate) cficaches: CfiCacheActor,
3737
ppdb_caches: PortablePdbCacheActor,
3838
pub(crate) sourcefiles_cache: Arc<SourceFilesCache>,

crates/symbolicator-native/tests/integration/process_minidump.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,39 @@ async fn test_minidump_macos() {
4848
assert_snapshot!(res);
4949
}
5050

51+
/// Verifies that debug information is used to guide stack scanning.
52+
///
53+
/// The minidump being tested has frame pointers disabled, so in the absence of (usable) CFI,
54+
/// there is no choice but to scan. Left to its own devices, the stack scanner can produce a lot
55+
/// of spurious frames, so we want to use both CFI and debug info that we have available to guide
56+
/// it. The heuristic is as follows: if stack scanning would produce an instruction address
57+
/// * falling into some module,
58+
/// * for which we have CFI or debug info,
59+
/// * which doesn't cover that instruction address,
60+
/// then that address is probably bogus and we discard the frame.
61+
///
62+
/// The CFI half of this heuristic was implemented in
63+
/// https://github.com/getsentry/symbolicator/pull/1651.
64+
/// https://github.com/getsentry/symbolicator/pull/1913 added the debug info part.
65+
///
66+
/// This test simulates an actual customer situation: we have both debug info and CFI
67+
/// for the minidump, but the CFI is truncated/poor. If we only used CFI to constrain the
68+
/// stack scanner, all frames in `sentry_example` would be rejected. However, since we also use
69+
/// the (good) debug info to check the frames, they are retained.
70+
///
71+
/// The `dyld` frames in the stacktrace are found by scanning, and since we have no CFI or
72+
/// debug info at all for that module, they're all accepted.
73+
#[tokio::test]
74+
async fn test_minidump_macos_broken_cfi() {
75+
let res = stackwalk_minidump("macos-no-fp.dmp").await;
76+
let crashed_thread = res
77+
.stacktraces
78+
.iter()
79+
.find(|st| st.is_requesting.unwrap_or_default())
80+
.unwrap();
81+
assert_snapshot!(crashed_thread);
82+
}
83+
5184
#[tokio::test]
5285
async fn test_minidump_linux() {
5386
let res = stackwalk_minidump("linux.dmp").await;

0 commit comments

Comments
 (0)