Skip to content

Commit 16ef231

Browse files
Implement dynamic token stale period based on token TTL (#677)
## Summary Extends the token refresh buffer from a fixed 5 minutes to a dynamic period that adapts to token lifetime, improving reliability across different token types while maintaining backward compatibility. The TTL is computed based on the remaining time to live at the moment the token is received and not its real TTL. For example, if the token has a TTL of 60 minutes but was acquired 20 minutes before being used by the SDK (e.g. through the CLI), its effective TTL will be 60 − 20 = 40 minutes. ## Why The previous 5 minute stale period barely covered the allowed monthly downtime of ~4.32 minutes. Thus, if the auth services were to be down and a request were to come 4 minute into the stale period it would have only 1 minute to obtain a new valid token before expiry. With an extended stale period of 20 minutes, the SDK has an extra ~15 minutes of stale but valid tokens to use, allowing the auth system to recover. ## Changes - Increase the maximum stale period from 5 to 20 minutes to support 99.95% availability. - Implement dynamic stale period calculation: min(TTL × 0.5, 20 minutes). - Compute stale period per-token at acquisition time. ### Backward compatibility: - The public Builder method setStaleDuration() is preserved. Calling it disables dynamic mode via a useDynamicStaleDuration flag, reverting the behavior to the legacy fixed-duration stale window. This ensures that any caller already configuring a custom stale period is unaffected. ## Implementation: - Add computeStaleDuration(Token) that computes min(TTL / 2, MAX_STALE_DURATION). - Add useDynamicStaleDuration flag to the Builder, defaulting to true; setStaleDuration() sets it to false. - Add volatile dynamicStaleDuration field; initialized to MAX_STALE_DURATION as a safe default when no token is pre-set, or computed from the pre-set token via computeStaleDuration() when one is provided. - Update getTokenState() to use dynamicStaleDuration or the legacy staleDuration based on the flag. - Update getTokenBlocking() to recompute the stale period after a successful synchronous refresh. - Update triggerAsyncRefresh() to recompute the stale period after a successful async refresh. ## Testing - Update testAsyncRefreshParametrized to use TestClockSupplier for deterministic time control, adding a clockAdvanceMinutes parameter to bring tokens into the stale window without relying on wall-clock timing. - Add a capped stale duration scenario: 60-min TTL token advanced 41 minutes leaves lifeTime = 19 min ≤ 20 min cap → STALE, verifying MAX_STALE_DURATION is correctly applied. - Update testAsyncRefreshFailureFallback to use a 4-minute TTL token advanced by 3 minutes to reliably enter the stale window under the dynamic formula.
1 parent 7554de9 commit 16ef231

File tree

3 files changed

+119
-15
lines changed

3 files changed

+119
-15
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@
1111
### Documentation
1212

1313
### Internal Changes
14+
* Implement dynamic auth token stale period based on initial token lifetime. Increased up to 20 mins for standard OAuth with proportionally shorter periods for short-lived tokens. Manually setting the stale period using the CachedTokeSource builder reverts the behaviour to the legacy fixed stale duration.
1415

1516
### API Changes

databricks-sdk-java/src/main/java/com/databricks/sdk/core/oauth/CachedTokenSource.java

Lines changed: 57 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ private enum TokenState {
3333
// Default duration before expiry to consider a token as 'stale'. This value is chosen to cover
3434
// the maximum monthly downtime allowed by a 99.99% uptime SLA (~4.38 minutes).
3535
private static final Duration DEFAULT_STALE_DURATION = Duration.ofMinutes(5);
36+
// The maximum stale duration that can be achieved before expiry to consider a token as 'stale'
37+
// when using the dynamic stale duration method. This value is chosen to cover the maximum
38+
// monthly downtime allowed by a 99.99% uptime SLA (~4.38 minutes) while increasing the likelihood
39+
// that the token is refreshed asynchronously if the auth server is down.
40+
private static final Duration MAX_STALE_DURATION = Duration.ofMinutes(20);
3641
// Default additional buffer before expiry to consider a token as expired.
3742
// This is 40 seconds by default since Azure Databricks rejects tokens that are within 30 seconds
3843
// of expiry.
@@ -42,8 +47,12 @@ private enum TokenState {
4247
private final TokenSource tokenSource;
4348
// Whether asynchronous refresh is enabled.
4449
private boolean asyncDisabled = false;
45-
// Duration before expiry to consider a token as 'stale'.
46-
private final Duration staleDuration;
50+
// The legacy duration before expiry to consider a token as 'stale'.
51+
private final Duration staticStaleDuration;
52+
// Whether to use the dynamic stale duration computation or defer to the legacy duration.
53+
private final boolean useDynamicStaleDuration;
54+
// The dynamically computed duration before expiry to consider a token as 'stale'.
55+
private volatile Duration dynamicStaleDuration;
4756
// Additional buffer before expiry to consider a token as expired.
4857
private final Duration expiryBuffer;
4958
// Clock supplier for current time.
@@ -59,10 +68,17 @@ private enum TokenState {
5968
private CachedTokenSource(Builder builder) {
6069
this.tokenSource = builder.tokenSource;
6170
this.asyncDisabled = builder.asyncDisabled;
62-
this.staleDuration = builder.staleDuration;
71+
this.staticStaleDuration = builder.staleDuration;
72+
this.useDynamicStaleDuration = builder.useDynamicStaleDuration;
6373
this.expiryBuffer = builder.expiryBuffer;
6474
this.clockSupplier = builder.clockSupplier;
6575
this.token = builder.token;
76+
77+
if (this.useDynamicStaleDuration && this.token != null) {
78+
this.dynamicStaleDuration = computeStaleDuration(this.token);
79+
} else {
80+
this.dynamicStaleDuration = Duration.ofMinutes(0);
81+
}
6682
}
6783

6884
/**
@@ -75,6 +91,7 @@ public static class Builder {
7591
private final TokenSource tokenSource;
7692
private boolean asyncDisabled = false;
7793
private Duration staleDuration = DEFAULT_STALE_DURATION;
94+
private boolean useDynamicStaleDuration = true;
7895
private Duration expiryBuffer = DEFAULT_EXPIRY_BUFFER;
7996
private ClockSupplier clockSupplier = new UtcClockSupplier();
8097
private Token token;
@@ -130,6 +147,7 @@ public Builder setAsyncDisabled(boolean asyncDisabled) {
130147
*/
131148
public Builder setStaleDuration(Duration staleDuration) {
132149
this.staleDuration = staleDuration;
150+
this.useDynamicStaleDuration = false;
133151
return this;
134152
}
135153

@@ -188,6 +206,21 @@ public Token getToken() {
188206
return getTokenAsync();
189207
}
190208

209+
private Duration computeStaleDuration(Token t) {
210+
if (t.getExpiry() == null) {
211+
return Duration.ZERO; // Tokens with no expiry are considered permanent.
212+
}
213+
214+
Duration ttl = Duration.between(Instant.now(clockSupplier.getClock()), t.getExpiry());
215+
216+
if (ttl.compareTo(Duration.ZERO) <= 0) {
217+
return Duration.ZERO;
218+
}
219+
220+
Duration halfTtl = ttl.dividedBy(2);
221+
return halfTtl.compareTo(MAX_STALE_DURATION) > 0 ? MAX_STALE_DURATION : halfTtl;
222+
}
223+
191224
/**
192225
* Determine the state of the current token (fresh, stale, or expired).
193226
*
@@ -197,10 +230,15 @@ protected TokenState getTokenState(Token t) {
197230
if (t == null) {
198231
return TokenState.EXPIRED;
199232
}
233+
if (t.getExpiry() == null) {
234+
return TokenState.FRESH; // Tokens with no expiry are considered permanent.
235+
}
236+
200237
Duration lifeTime = Duration.between(Instant.now(clockSupplier.getClock()), t.getExpiry());
201238
if (lifeTime.compareTo(expiryBuffer) <= 0) {
202239
return TokenState.EXPIRED;
203240
}
241+
Duration staleDuration = useDynamicStaleDuration ? dynamicStaleDuration : staticStaleDuration;
204242
if (lifeTime.compareTo(staleDuration) <= 0) {
205243
return TokenState.STALE;
206244
}
@@ -228,13 +266,22 @@ protected Token getTokenBlocking() {
228266
return token;
229267
}
230268
lastRefreshSucceeded = false;
269+
Token newToken;
231270
try {
232-
token = tokenSource.getToken();
271+
newToken = tokenSource.getToken();
233272
} catch (Exception e) {
234273
logger.error("Failed to refresh token synchronously", e);
235274
throw e;
236275
}
237276
lastRefreshSucceeded = true;
277+
278+
// Write dynamicStaleDuration before publishing the new token via the volatile write,
279+
// so unsynchronized readers that see the new token are guaranteed to also see the
280+
// updated dynamicStaleDuration.
281+
if (useDynamicStaleDuration && newToken != null) {
282+
dynamicStaleDuration = computeStaleDuration(newToken);
283+
}
284+
token = newToken;
238285
return token;
239286
}
240287
}
@@ -279,6 +326,12 @@ private synchronized void triggerAsyncRefresh() {
279326
// Attempt to refresh the token in the background.
280327
Token newToken = tokenSource.getToken();
281328
synchronized (this) {
329+
// Write dynamicStaleDuration before publishing the new token via the volatile
330+
// write, so unsynchronized readers that see the new token are guaranteed to also
331+
// see the updated dynamicStaleDuration.
332+
if (useDynamicStaleDuration && newToken != null) {
333+
dynamicStaleDuration = computeStaleDuration(newToken);
334+
}
282335
token = newToken;
283336
refreshInProgress = false;
284337
}

databricks-sdk-java/src/test/java/com/databricks/sdk/core/oauth/CachedTokenSourceTest.java

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,38 +17,81 @@ public class CachedTokenSourceTest {
1717
private static final String TOKEN_TYPE = "Bearer";
1818
private static final String INITIAL_TOKEN = "initial-token";
1919
private static final String REFRESH_TOKEN = "refreshed-token";
20+
2021
private static final long FRESH_MINUTES = 10;
21-
private static final long STALE_MINUTES = 1;
22+
23+
// Token TTL for the stale scenario: 4 minutes.
24+
// dynamicStaleDuration = min(4/2, 20) = 2 min.
25+
// After advancing the clock by STALE_ADVANCE_MINUTES = 3, lifeTime = 1 min.
26+
// 1 min ≤ 2 min (stale) and 1 min > 40s (not expired) → STALE.
27+
private static final long STALE_MINUTES = 4;
28+
private static final long STALE_ADVANCE_MINUTES = 3;
29+
30+
// Token TTL for the capped stale duration scenario: 60 minutes.
31+
// dynamicStaleDuration = min(60/2, 20) = 20 min (MAX_STALE_DURATION cap).
32+
// After advancing the clock by CAPPED_STALE_ADVANCE_MINUTES = 41, lifeTime = 19 min.
33+
// 19 min ≤ 20 min (stale) and 19 min > 40s (not expired) → STALE.
34+
private static final long CAPPED_STALE_MINUTES = 60;
35+
private static final long CAPPED_STALE_ADVANCE_MINUTES = 41;
36+
2237
private static final long EXPIRED_MINUTES = -1;
2338

2439
private static Stream<Arguments> provideAsyncRefreshScenarios() {
2540
return Stream.of(
26-
Arguments.of("Fresh token, async enabled", FRESH_MINUTES, false, false, INITIAL_TOKEN),
27-
Arguments.of("Stale token, async enabled", STALE_MINUTES, false, true, INITIAL_TOKEN),
28-
Arguments.of("Expired token, async enabled", EXPIRED_MINUTES, false, true, REFRESH_TOKEN),
29-
Arguments.of("Fresh token, async disabled", FRESH_MINUTES, true, false, INITIAL_TOKEN),
30-
Arguments.of("Stale token, async disabled", STALE_MINUTES, true, false, INITIAL_TOKEN),
31-
Arguments.of("Expired token, async disabled", EXPIRED_MINUTES, true, true, REFRESH_TOKEN));
41+
Arguments.of("Fresh token, async enabled", FRESH_MINUTES, 0L, false, false, INITIAL_TOKEN),
42+
Arguments.of(
43+
"Stale token, async enabled",
44+
STALE_MINUTES,
45+
STALE_ADVANCE_MINUTES,
46+
false,
47+
true,
48+
INITIAL_TOKEN),
49+
Arguments.of(
50+
"Expired token, async enabled", EXPIRED_MINUTES, 0L, false, true, REFRESH_TOKEN),
51+
Arguments.of("Fresh token, async disabled", FRESH_MINUTES, 0L, true, false, INITIAL_TOKEN),
52+
Arguments.of(
53+
"Stale token, async disabled",
54+
STALE_MINUTES,
55+
STALE_ADVANCE_MINUTES,
56+
true,
57+
false,
58+
INITIAL_TOKEN),
59+
Arguments.of(
60+
"Stale token, capped stale duration, async enabled",
61+
CAPPED_STALE_MINUTES,
62+
CAPPED_STALE_ADVANCE_MINUTES,
63+
false,
64+
true,
65+
INITIAL_TOKEN),
66+
Arguments.of(
67+
"Expired token, async disabled", EXPIRED_MINUTES, 0L, true, true, REFRESH_TOKEN));
3268
}
3369

3470
@ParameterizedTest(name = "{0}")
3571
@MethodSource("provideAsyncRefreshScenarios")
3672
void testAsyncRefreshParametrized(
3773
String testName,
3874
long minutesUntilExpiry,
75+
long clockAdvanceMinutes,
3976
boolean asyncDisabled,
4077
boolean expectRefresh,
4178
String expectedToken)
4279
throws Exception {
4380

81+
TestClockSupplier clockSupplier = new TestClockSupplier(Instant.now());
82+
4483
Token initialToken =
4584
new Token(
4685
INITIAL_TOKEN,
4786
TOKEN_TYPE,
4887
null,
49-
Instant.now().plus(Duration.ofMinutes(minutesUntilExpiry)));
88+
Instant.now(clockSupplier.getClock()).plus(Duration.ofMinutes(minutesUntilExpiry)));
5089
Token refreshedToken =
51-
new Token(REFRESH_TOKEN, TOKEN_TYPE, null, Instant.now().plus(Duration.ofMinutes(10)));
90+
new Token(
91+
REFRESH_TOKEN,
92+
TOKEN_TYPE,
93+
null,
94+
Instant.now(clockSupplier.getClock()).plus(Duration.ofMinutes(10)));
5295
CountDownLatch refreshCalled = new CountDownLatch(1);
5396

5497
TokenSource tokenSource =
@@ -69,8 +112,12 @@ public Token getToken() {
69112
new CachedTokenSource.Builder(tokenSource)
70113
.setAsyncDisabled(asyncDisabled)
71114
.setToken(initialToken)
115+
.setClockSupplier(clockSupplier)
72116
.build();
73117

118+
// Advance the clock to put the token in the expected state before calling getToken().
119+
clockSupplier.advanceTime(Duration.ofMinutes(clockAdvanceMinutes));
120+
74121
Token token = source.getToken();
75122

76123
boolean refreshed = refreshCalled.await(1, TimeUnit.SECONDS);
@@ -90,13 +137,13 @@ void testAsyncRefreshFailureFallback() throws Exception {
90137
// Create a mutable clock supplier that we can control
91138
TestClockSupplier clockSupplier = new TestClockSupplier(Instant.now());
92139

93-
// Create a token that will be stale (2 minutes until expiry)
140+
// Create a token with a TTL of 4 minutes that will be stale in 3 minutes.
94141
Token staleToken =
95142
new Token(
96143
INITIAL_TOKEN,
97144
TOKEN_TYPE,
98145
null,
99-
Instant.now(clockSupplier.getClock()).plus(Duration.ofMinutes(2)));
146+
Instant.now(clockSupplier.getClock()).plus(Duration.ofMinutes(4)));
100147

101148
class TestSource implements TokenSource {
102149
int refreshCallCount = 0;
@@ -132,6 +179,9 @@ public Token getToken() {
132179
.setClockSupplier(clockSupplier)
133180
.build();
134181

182+
// Advance clock to put the token in the stale window.
183+
clockSupplier.advanceTime(Duration.ofMinutes(3));
184+
135185
// First call triggers async refresh, which fails
136186
// Should return stale token immediately (async refresh)
137187
Token token = source.getToken();

0 commit comments

Comments
 (0)