[improve][broker] PIP-327: Support force topic loading for unrecoverable errors#21759
Merged
rdhabalia merged 1 commit intoapache:masterfrom Oct 4, 2024
Merged
[improve][broker] PIP-327: Support force topic loading for unrecoverable errors#21759rdhabalia merged 1 commit intoapache:masterfrom
rdhabalia merged 1 commit intoapache:masterfrom
Conversation
eaaee76 to
ec52b6d
Compare
Contributor
|
Should we consider the exceptions when initializing the |
Denovo1998
reviewed
Jan 6, 2024
...ar-broker/src/main/java/org/apache/pulsar/broker/service/schema/BookkeeperSchemaStorage.java
Show resolved
Hide resolved
Member
|
@rdhabalia This PIP are approved, are we still working on the PR? |
2 tasks
vineeth1995
approved these changes
Oct 4, 2024
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21759 +/- ##
============================================
+ Coverage 73.57% 74.57% +0.99%
- Complexity 32624 34561 +1937
============================================
Files 1877 1936 +59
Lines 139502 145378 +5876
Branches 15299 15893 +594
============================================
+ Hits 102638 108410 +5772
+ Misses 28908 28668 -240
- Partials 7956 8300 +344
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Member
|
This PR introduced a new flaky test, #23417. |
4 tasks
hanmz
pushed a commit
to hanmz/pulsar
that referenced
this pull request
Feb 12, 2025
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It fixes #21751
PIP: #21752
Motivation
We have introduced a configuration called
autoSkipNonRecoverableDatabefore open-sourcing Pulsar as we have come across with various situations when it was not possible to recover ledgers belonging to managed-ledger or managed-cursors and the broker was not able to load the topics. In such situations,autoSkipNonRecoverableDataflag helps to skip non-recoverable leger-recovery errors such as ledger_not_found and allows the broker to load topics by skipping such ledgers in disaster recovery.Brokers can recognize such non-recoverable errors using bookkeeper error codes but in some cases, it’s very tricky and not possible to conclude non-recoverable errors. For example, the broker can not differentiate between all the ensemble bookies of the ledgers that are temporarily unavailable or are permanently removed from the cluster without graceful recovery, and because of that broker doesn’t consider all the bookies deleted as a non-recoverable error though we can not recover ledgers in such situations where all the bookies are removed due to various reasons such as Dev cluster clean up or system faced data disaster with multiple bookie loss. In such situations, the system admin has to manually identify such non-recoverable topics and update those topics’ managed-ledger and managed-cursor’s metadata and reload topics again which requires a lot of manual effort and sometimes it might not be feasible to handle such situations with a large number of topics that require this manual procedure to fix those topics.
Modifications
Therefore, the system admin should have a dynamic configuration called
managedLedgerForceRecoveryto use in such situations to allow brokers to forcefully load topics by skipping ledger failures to avoid topic unavailability and perform auto repairs of the topics. This will allow the admin to handle disaster recovery situations in a controlled and automated manner and maintain the topic availability by mitigating such failures.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: