Skip to content

Conversation

Copy link

Copilot AI commented Sep 24, 2025

This PR adds a new Python script tools/analyze_stacks.py that helps developers analyze stack traces to identify interesting ones and filter out mundane (idle) stacks during debugging sessions.

Problem

When debugging multi-threaded applications, developers often encounter many stack traces where most threads are idle - waiting on synchronization primitives, thread pools, or other blocking operations. These idle stacks create noise that makes it difficult to identify the truly interesting stacks that show active work or potential issues.

For example, the following stack traces are all mundane (showing futex waits in thread pools):

#0  __futex_abstimed_wait_common64 (...) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (...) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (...) at ./nptl/futex-internal.c:139
#3  0x00007fa3f3a61bdf in do_futex_wait (...) at ./nptl/sem_waitcommon.c:111
#4  0x00007fa3f3a61c78 in __new_sem_wait_slow64 (...) at ./nptl/sem_waitcommon.c:183
#5  0x00007fa3f410488d in non-virtual thunk to CPooledThreadWrapper::run() () from /opt/HPCCSystems/lib/libjlib.so

Solution

The new analyze_stacks.py script uses a sophisticated two-tier filtering approach:

  1. Parses stack traces from various debugging tools (gdb, eu-stack, etc.)

  2. Contextual filtering to avoid false positives - requires BOTH:

    • Low-level wait primitives (top ~4 frames): mutex_lock, futex_wait, pthread_cond_wait, etc.
    • High-level idle context (anywhere in stack): CPooledThreadWrapper::run, RoxieQueue::wait, etc.
  3. Simple patterns for always-idle functions: start_thread, epoll_wait, PerfTracer

  4. Provides flexible output to show either interesting stacks (default) or filtered mundane stacks

Key Improvement: Avoids False Positives

The contextual filtering prevents marking interesting stacks as mundane just because they use synchronization primitives:

  • Mundane: mutex_lock -> Waiter::wait -> CPooledThreadWrapper::run (thread pool waiting)
  • Interesting: mutex_lock -> acquireLock -> processImportantData (active work)

Usage

# Show only interesting stacks (filters out idle ones)
python3 tools/analyze_stacks.py stacktrace.txt

# Show detailed analysis with pattern matches
python3 tools/analyze_stacks.py stacktrace.txt --verbose

# Show the filtered out (mundane) stacks instead
python3 tools/analyze_stacks.py stacktrace.txt --show-filtered

# Use with pipes
cat stacktrace.txt | python3 tools/analyze_stacks.py -

# Use custom filter patterns (supports both simple and contextual patterns)
python3 tools/analyze_stacks.py stacktrace.txt --config custom_filters.json

Integration

This tool complements existing HPCC Platform debugging infrastructure:

  • Works with output from the existing doperf performance tool
  • Processes eu-stack output format
  • Handles gdb backtrace format
  • Can be integrated into analysis pipelines

The script successfully identifies the problem case (futex waits in thread pools) as 100% mundane, while preserving important stacks that happen to use synchronization primitives for productive work.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copilot AI changed the title [WIP] Write a python script that analyzes stacks, and helps to weed out the interesting ones, or ignore the mundane (e.g. idle) ones. This is a sample: #0 __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0,... Add Python stack trace analysis tool for debugging Sep 24, 2025
Copilot AI requested a review from jakesmith September 24, 2025 17:41
Copy link
Owner

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - this isn't a sufficient solution. There may be interesting stacks waiting on locks/mutexes/wait conditions.
What makes them non-interesting is a combination of the top of the stack and some intermediate stack call higher up, that shows it's benign and truly idle.
So the scheme needs to represent this info in the ignore list

Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Copy link
Author

Copilot AI commented Sep 24, 2025

@copilot - this isn't a sufficient solution. There may be interesting stacks waiting on locks/mutexes/wait conditions.
What makes them non-interesting is a combination of the top of the stack and some...

I've redesigned the filtering logic to use contextual patterns that require BOTH low-level wait primitives AND high-level idle context. This prevents false positives where interesting stacks waiting on locks/mutexes get incorrectly filtered out.

The new approach uses two pattern types:

  • Contextual patterns: Match only when both mutex_lock (top frames) AND CPooledThreadWrapper::run (anywhere) are present
  • Simple patterns: Always idle (like start_thread, epoll_wait)

For example, a stack with mutex_lock -> processImportantData is now preserved as interesting, while mutex_lock -> Waiter::wait -> CPooledThreadWrapper::run is correctly filtered as mundane.

Commit d9f94ed implements this improvement.

Copilot AI changed the title Add Python stack trace analysis tool for debugging Add Python stack trace analysis tool with contextual filtering for debugging Sep 24, 2025
Copilot AI requested a review from jakesmith September 24, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants