Fix rotating secret with TTL=0 to use exponential backoff retry instead of 1 sec#2134
Fix rotating secret with TTL=0 to use exponential backoff retry instead of 1 sec#2134aswanidutt wants to merge 14 commits intomainfrom
Conversation
- When a rotating secret has rotation_period but ttl=0, it should not be treated as a rotating secret - Instead, it should fall back to using the original LeaseDuration with non-renewable lease logic (80-95% threshold) - This prevents the bug where sleep time would be 0 seconds Changes: - vault_common.go: Only set rotatingSecret=true when ttl > 0 - vault_common_test.go: Add test case for rotating secret with ttl=0 Fixes: VAULT-38829
|
Can you separate the unrelated fixes into a separate PR? Like the race condition, CWE-338, etc. |
|
need to merge #2135 first before this PR |
#2135 added for the above unrelated fixes |
tvoran
left a comment
There was a problem hiding this comment.
Just a couple questions. It's also a little strange to me that the TestFileQuery_Fetch/fires_changes test is failing on this PR, when it seems to have passed for recent commits in this repo.
dependency/vault_common.go
Outdated
| // FIX: When TTL is 0, implement exponential backoff with jitter | ||
| // Start at 5s, double each time, max is VaultDefaultLeaseDuration or 300s |
There was a problem hiding this comment.
It would be good to mention something to the effect that Vault responds with a TTL of 0 when there's been an error rotating the secret on the Vault side, so this is effectively retrying an error condition.
There was a problem hiding this comment.
added relavent comment
dependency/vault_common.go
Outdated
| backoff := 5 * (1 << uint(retryCount)) // 5s, 10s, 20s, 40s, 80s... | ||
| maxBackoff := int(VaultDefaultLeaseDuration.Seconds()) | ||
| if maxBackoff <= 0 { | ||
| maxBackoff = 300 // Default max of 300 seconds if not configured | ||
| } | ||
| if backoff > maxBackoff { | ||
| backoff = maxBackoff | ||
| } | ||
| backoffDur := time.Duration(backoff) * time.Second |
There was a problem hiding this comment.
Have you considered using the cenkalti/backoff library here? I'm not sure it'll fit this exact use case, but it's pretty widely used in Vault and it's already an indirect import in this project.
There was a problem hiding this comment.
there are multiple libraries that are being used for the same use case, I have changed this to cenkalti-v4
dependency/vault_common.go
Outdated
| } else if ttlData == 0 { | ||
| // FIX: When TTL is 0, implement exponential backoff with jitter | ||
| // Start at 5s, double each time, max is VaultDefaultLeaseDuration or 300s | ||
| backoff := 5 * (1 << uint(retryCount)) // 5s, 10s, 20s, 40s, 80s... |
There was a problem hiding this comment.
How was it decided to start at 5s for the first retry? Should it retry quicker than that?
There was a problem hiding this comment.
changed to default 2 series instead of 5 from other references in the repo
| if ttlData > 0 { | ||
| log.Printf("[DEBUG] Found rotation_period and set lease duration to %d seconds", ttlData) | ||
| // Add a second for cushion | ||
| base = int(ttlData) + 1 | ||
| rotatingSecret = true |
There was a problem hiding this comment.
Feels like the retryCount should be reset when this condition is hit? Since we're treating ttl=0 as an error to retry, and ttl > 0 as a successful fetch?
There was a problem hiding this comment.
good catch, I made that change. done
| // Add a second for cushion | ||
| base = int(ttlData) + 1 | ||
| rotatingSecret = true | ||
| } else if ttlData == 0 { |
There was a problem hiding this comment.
So in looking at consul-template a bit more it looks like there's already a notion of retries with backoff that is set in the RetryConfig:
consul-template/config/vault.go
Line 66 in dd7e8f0
I'm wondering if we've tried just returning an error from VaultReadQuery.Fetch() and VaultWriteQuery.Fetch() in the ttl=0 case. Would that trigger the existing retry mechanism?
There was a problem hiding this comment.
@tvoran thanks for that input, thats alot easier than I thought - here is the PR - #2136
but the only caveat is that there is default max 12 retries and maxbackoff of 1 min for retry.go. So, I had to use cenkaltiv4 in view.go to customize the retry backoff with max backoff of 5m and unlimited retries.
issue: When a rotating secret that has rotation_period but ttl=0, it should not be treated as a rotating secret. Instead, it should wait and retry exponentially until original LeaseDuration with non-renewable lease logic (80-95% threshold)
Changes:
Fixes: VAULT-38829-https://hashicorp.atlassian.net/browse/VAULT-38829?
before this change: checking to rotate every second
after the code change : checking every Exponential backoff: 2s, 4s, 8s, 16s, 32s, 64s, the max is VaultDefaultLeaseDuration set on consul-template