Skip to content

[fix] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time#9

Open
yyj8 wants to merge 2 commits intomasterfrom
bugfix_healthcheck_deadlocked_log_print
Open

[fix] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time#9
yyj8 wants to merge 2 commits intomasterfrom
bugfix_healthcheck_deadlocked_log_print

Conversation

@yyj8
Copy link
Owner

@yyj8 yyj8 commented Jun 15, 2024

broker log a full thread dump when a deadlock is detected in healthcheck every time

Fixes #xyz

Main Issue: apache#22915

PIP: #xyz

Motivation

Broker log a full thread dump when a deadlock is detected in healthcheck every time.

Our expectation is:
First detection of deadlock printing full thread dump, then printing at the interval between parameter settings in

// org.apache.pulsar.broker.admin.impl.BrokersBase.java
private static final long LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED = 600000L;

Modifications

  1. class:org.apache.pulsar.broker.admin.impl.BrokersBase.java Variable threadDumpLoggedTimestamp decorated with static keyword. Avoid initializing parameters with a value of 0 every time the interface is called.

  2. Method checkDeadlockedThreads comparison before and after modification:

(1) before:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if (System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

(2)after:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if ((System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) ||
                    threadDumpLoggedTimestamp == 0) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:
#9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant