Skip to content

fix(stackwalking): Use debug info to inform scanning#1913

Merged
loewenheim merged 10 commits intomasterfrom
sebastian/scanning
Apr 1, 2026
Merged

fix(stackwalking): Use debug info to inform scanning#1913
loewenheim merged 10 commits intomasterfrom
sebastian/scanning

Conversation

@loewenheim
Copy link
Copy Markdown
Contributor

In an effort to reduce the number of "hallucinated" stack frames, #1651 introduced a sanity check for stack scanning into the minidump processor: if we have CFI for a module and the stack scanner would produce a frame that is not covered by that CFI, the frame is discarded.

However, this made the processor too strict—if the CFI for the module is poor or incomplete, this may lead to us discarding valid frames. With this PR, we now additionally use debug information in the form of symcaches to inform stack scanning. The new logic is: if we have both CFI and debug info for a module, but neither covers the potential frame's instruction address, then the frame is treated as invalid.

Obviously this involves pulling symcache computation into minidump stackwalking, which previously didn't need to know about symcaches or debug info at all. Performance-wise I believe this to be no problem—symcache computation is cached and would have happened during the symbolication step anyway, now it just happens earlier. But it does leave us in a somewhat awkward spot where we compute debug info during stackwalking but then don't actually fill it in. We may want to rethink this in the future.

Fixes INGEST-838.

@loewenheim loewenheim requested a review from a team as a code owner March 30, 2026 14:02
@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 30, 2026

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Missing emptiness check on symcache causes false frame rejection
    • Added a check using symcache.functions().next().is_some() to verify the symcache contains functions before using lookup results to reject frames.
  • ✅ Fixed: Duplicated ObjectId construction logic risks future divergence
    • Extracted the duplicated ObjectId construction logic into a shared build_object_id() helper method used by both load_cfi_module and load_symcache.

Create PR

Or push these changes by commenting:

@cursor push 8a5eae0402
Preview (8a5eae0402)
diff --git a/crates/symbolicator-native/src/symbolication/process_minidump.rs b/crates/symbolicator-native/src/symbolication/process_minidump.rs
--- a/crates/symbolicator-native/src/symbolication/process_minidump.rs
+++ b/crates/symbolicator-native/src/symbolication/process_minidump.rs
@@ -203,6 +203,30 @@
         }
     }
 
+    /// Constructs an `ObjectId` for the given module, applying the first module rewrite rules if applicable.
+    fn build_object_id(&self, module: &dyn Module) -> ObjectId {
+        let code_file = non_empty_file_name(&module.code_file());
+        let mut debug_file = module.debug_file().as_deref().and_then(non_empty_file_name);
+
+        // Rewrite the first module's debug file according to the configured rewrite rules.
+        if module.debug_identifier() == self.first_module_debug_id
+            && let Some(new_debug_file) = debug_file
+                .as_ref()
+                .and_then(|debug_file| self.rewrite_first_module.rewrite(debug_file))
+        {
+            debug_file = Some(new_debug_file);
+        }
+
+        ObjectId {
+            code_id: module.code_identifier(),
+            code_file,
+            debug_id: module.debug_identifier(),
+            debug_file,
+            debug_checksum: None,
+            object_type: self.object_type,
+        }
+    }
+
     /// Fetches CFI for the given module, parses it into a `SymbolFile`, and stores it internally.
     async fn load_cfi_module(&self, module: &(dyn Module + Sync)) -> Arc<FetchedCfiCache> {
         let key = LookupKey::new(module);
@@ -231,28 +255,8 @@
 
         let sources = self.sources.clone();
         let scope = self.scope.clone();
+        let identifier = self.build_object_id(module);
 
-        let code_file = non_empty_file_name(&module.code_file());
-        let mut debug_file = module.debug_file().as_deref().and_then(non_empty_file_name);
-
-        // Rewrite the first module's debug file according to the configured rewrite rules.
-        if module.debug_identifier() == self.first_module_debug_id
-            && let Some(new_debug_file) = debug_file
-                .as_ref()
-                .and_then(|debug_file| self.rewrite_first_module.rewrite(debug_file))
-        {
-            debug_file = Some(new_debug_file);
-        }
-
-        let identifier = ObjectId {
-            code_id: module.code_identifier(),
-            code_file,
-            debug_id: module.debug_identifier(),
-            debug_file,
-            debug_checksum: None,
-            object_type: self.object_type,
-        };
-
         let cficache = self
             .cficache_actor
             .fetch(FetchCfiCache {
@@ -282,28 +286,8 @@
     async fn load_symcache(&self, module: &(dyn Module + Sync)) -> DerivedCache<OwnedSymCache> {
         let sources = self.sources.clone();
         let scope = self.scope.clone();
+        let identifier = self.build_object_id(module);
 
-        let code_file = non_empty_file_name(&module.code_file());
-        let mut debug_file = module.debug_file().as_deref().and_then(non_empty_file_name);
-
-        // Rewrite the first module's debug file according to the configured rewrite rules.
-        if module.debug_identifier() == self.first_module_debug_id
-            && let Some(new_debug_file) = debug_file
-                .as_ref()
-                .and_then(|debug_file| self.rewrite_first_module.rewrite(debug_file))
-        {
-            debug_file = Some(new_debug_file);
-        }
-
-        let identifier = ObjectId {
-            code_id: module.code_identifier(),
-            code_file,
-            debug_id: module.debug_identifier(),
-            debug_file,
-            debug_checksum: None,
-            object_type: self.object_type,
-        };
-
         self.symcache_actor
             .fetch(FetchSymCache {
                 object_type: self.object_type,
@@ -364,9 +348,14 @@
 
         // As above, but with symbol/debug information this time.
         if let Ok(cached) = &symcache_module.cache {
+            let symcache = cached.get();
             let instruction = frame.get_instruction() - module.base_address();
 
-            if cached.get().lookup(instruction).next().is_none() {
+            // Check if the symcache actually contains any functions. An empty symcache
+            // (e.g., from a stripped binary) should not be used to reject frames.
+            if symcache.functions().next().is_some()
+                && symcache.lookup(instruction).next().is_none()
+            {
                 // We definitely do have symbol information loaded, but the given instruction does not
                 // map to any symbol info.
                 valid_by_symbol_info = false;

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

// We definitely do have symbol information loaded, but the given instruction does not
// map to any symbol info.
valid_by_symbol_info = false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing emptiness check on symcache causes false frame rejection

Medium Severity

The symcache lookup path has no emptiness guard, unlike the parallel CFI path. For CFI, empty data is wrapped in Option::None (which doesn't match Ok(Some(cached))), and there's an additional !cfi.is_empty() check. The symcache path has neither protection — a successfully loaded but symbolless symcache (e.g., from a stripped binary found via ObjectPurpose::Debug) will match if let Ok(cached) and lookup(instruction).next().is_none() will return true, falsely setting valid_by_symbol_info to false. Combined with valid_by_cfi = false, this causes valid frames to be discarded — the very problem the PR aims to fix.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really worried about this case.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Copy Markdown

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add tests for this to avoid regressions in the future?
Maybe add fixtures you used for debugging write a test over it?

@loewenheim
Copy link
Copy Markdown
Contributor Author

I added a test based on a minidump created with the example in sentry-native.

@loewenheim loewenheim merged commit d906ff2 into master Apr 1, 2026
25 checks passed
@loewenheim loewenheim deleted the sebastian/scanning branch April 1, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants