-
Notifications
You must be signed in to change notification settings - Fork 598
HDDS-14859. Ignore transient errors while validating RocksDb on volumes. #9947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,7 @@ | |
| import jakarta.annotation.Nullable; | ||
| import java.io.File; | ||
| import java.io.IOException; | ||
| import java.time.Duration; | ||
| import java.util.Iterator; | ||
| import java.util.LinkedList; | ||
| import java.util.List; | ||
|
|
@@ -326,17 +327,31 @@ public VolumeCheckResult checkDbHealth(File dbFile) throws InterruptedException | |
| return VolumeCheckResult.HEALTHY; | ||
| } | ||
|
|
||
| // We attempt to open RocksDb twice to ignore any transient errors | ||
| // and to confirm that we actually cannot open RocksDb in readonly mode. | ||
| final boolean isVolumeTestResultHealthy = true; | ||
| try (ManagedOptions managedOptions = new ManagedOptions(); | ||
| ManagedRocksDB ignored = ManagedRocksDB.openReadOnly(managedOptions, dbFile.toString())) { | ||
| volumeTestResultQueue.add(isVolumeTestResultHealthy); | ||
| } catch (Exception e) { | ||
| if (Thread.currentThread().isInterrupted()) { | ||
| throw new InterruptedException("Check of database for volume " + this + " interrupted."); | ||
| final int maxAttempts = 2; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should it be configurable?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thought process for not making this configurable was to avoid problematic configurations where the total attempts could exceed the disk check timeout. For example if maxAttempts is 10 and you sleep for 1 minute in each attempt, firstly this is no longer about transient errors and the total time of 10 minutes will make the method flaky and fail the disk check timeout of 10 minutes. I can change it if we think we may still want that control in production. |
||
| final Duration maxRetryGap = getDatanodeConfig().getDiskCheckRetryGap(); | ||
| for (int attempt = 0; attempt < maxAttempts; attempt++) { | ||
| try (ManagedOptions managedOptions = new ManagedOptions(); | ||
| ManagedRocksDB ignored = | ||
| ManagedRocksDB.openAsSecondary(managedOptions, dbFile.toString(), getTmpDir().getPath())) { | ||
| volumeTestResultQueue.add(isVolumeTestResultHealthy); | ||
| break; | ||
| } catch (Exception e) { | ||
| if (Thread.currentThread().isInterrupted()) { | ||
| throw new InterruptedException("Check of database for volume " + this + " interrupted."); | ||
| } | ||
|
|
||
| if (attempt == maxAttempts - 1) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically here it does retry for any failure, not just transient errors. Do we know which errora are transient that could be recovered by retry, and which errors cannot?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the idea is that if you can't open the RocksDb twice, the problem should definitely be counted as an error. There is no clear list of the possible transient errors, as such this check is purely defensive. From the RocksDb docs and comments, it appears that we may not be able to open the database in a Read-Only mode when any of these happen:
|
||
| LOG.error("Could not open Volume DB located at {}", dbFile, e); | ||
| volumeTestResultQueue.add(!isVolumeTestResultHealthy); | ||
| volumeTestFailureCount.incrementAndGet(); | ||
| } else { | ||
| LOG.warn("Could not open Volume DB located at {}", dbFile, e); | ||
| Thread.sleep(maxRetryGap.toMillis()); | ||
| } | ||
| } | ||
| LOG.warn("Could not open Volume DB located at {}", dbFile, e); | ||
| volumeTestResultQueue.add(!isVolumeTestResultHealthy); | ||
| volumeTestFailureCount.incrementAndGet(); | ||
| } | ||
|
|
||
| if (volumeTestResultQueue.size() > volumeTestCount | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be at least 15m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't want a long time between two checks as the goal here is to ignore transient errors while opening RocksDb.
If the gap between two consecutive attempts at opening RocksDb is too long, it is no longer about transient errors.
Secondly, we can't have the time period between opening the RocksDb twice be 15 minutes as it will fail the disk check timeout of 10 minutes.
In case of errors when opening the RocksDb, we ideally want the thread to sleep for a very short time, try to open the RocksDb a second time and if we still fail, mark it as an actual failure to open RocksDb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also increase 10 mins timeout then? in case of 3rd retry we'll spend 2 out of 10 mins in sleep + time for actual processing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So 2 checks should be more than sufficient to definitively declare if we are failing to open RocksDb. After two checks we should let the sliding window handle if further checks should be required on that volume.
If we allow more than 2 checks then definitely the timeouts for each disk check will also have to become dynamic and the 10 minute threshold maybe too small. On the other hand we would like to know within 10 minutes if a disk is unhealthy or not as elongating this check pushes the future checks further out.