Skip to content

[fix] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time#22916

Merged
lhotari merged 2 commits intoapache:masterfrom
yyj8:bugfix_healthcheck_deadlocked_log_print
Jun 20, 2024
Merged

[fix] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time#22916
lhotari merged 2 commits intoapache:masterfrom
yyj8:bugfix_healthcheck_deadlocked_log_print

Conversation

@yyj8
Copy link
Contributor

@yyj8 yyj8 commented Jun 15, 2024

Fixes #22915

Motivation

Broker log a full thread dump when a deadlock is detected in healthcheck every time.

Our expectation is:
First detection of deadlock printing full thread dump, then printing at the interval between parameter settings in

// org.apache.pulsar.broker.admin.impl.BrokersBase.java
private static final long LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED = 600000L;

Modifications

  1. class:org.apache.pulsar.broker.admin.impl.BrokersBase.java Variable threadDumpLoggedTimestamp decorated with static keyword. Avoid initializing parameters with a value of 0 every time the interface is called.

  2. Method checkDeadlockedThreads comparison before and after modification:

(1) before:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if (System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

(2)after:

private void checkDeadlockedThreads() {
        ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
        long[] threadIds = threadBean.findDeadlockedThreads();
        if (threadIds != null && threadIds.length > 0) {
            ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, false, false);
            String threadNames = Arrays.stream(threadInfos)
                    .map(threadInfo -> threadInfo.getThreadName() + "(tid=" + threadInfo.getThreadId() + ")").collect(
                            Collectors.joining(", "));
            if ((System.currentTimeMillis() - threadDumpLoggedTimestamp
                    > LOG_THREADDUMP_INTERVAL_WHEN_DEADLOCK_DETECTED) ||
                    threadDumpLoggedTimestamp == 0) {
                threadDumpLoggedTimestamp = System.currentTimeMillis();
                LOG.error("Deadlocked threads detected. {}\n{}", threadNames,
                        ThreadDumpUtil.buildThreadDiagnosticString());
            } else {
                LOG.error("Deadlocked threads detected. {}", threadNames);
            }
            throw new IllegalStateException("Deadlocked threads detected. " + threadNames);
        }
    }

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:
yyj8#9

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jun 15, 2024
@yyj8 yyj8 requested a review from hanmz June 18, 2024 01:49
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari merged commit ca64505 into apache:master Jun 20, 2024
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
lhotari pushed a commit that referenced this pull request Jun 20, 2024
…ted in healthcheck every time (#22916)

(cherry picked from commit ca64505)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 21, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 24, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jun 25, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 1, 2024
…ted in healthcheck every time (apache#22916)

(cherry picked from commit ca64505)
(cherry picked from commit c9de1bb)
@lhotari lhotari added this to the 4.0.0 milestone Oct 14, 2024
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [broker] broker log a full thread dump when a deadlock is detected in healthcheck every time

3 participants