Generalize CLI token source into progressive command list#1605
Generalize CLI token source into progressive command list#1605mihaimitrea-db wants to merge 2 commits intomainfrom
Conversation
4cb4c26 to
82d9599
Compare
Range-diff: stack/force-refresh-flag (4cb4c26 -> 82d9599)
Reproduce locally: |
82d9599 to
68d45f4
Compare
Range-diff: stack/force-refresh-flag (82d9599 -> 68d45f4)
Reproduce locally: |
68d45f4 to
4a5079f
Compare
Range-diff: stack/force-refresh-flag (68d45f4 -> 4a5079f)
Reproduce locally: |
4a5079f to
6f4fead
Compare
Range-diff: stack/force-refresh-flag (4a5079f -> 6f4fead)
Reproduce locally: |
6f4fead to
2218270
Compare
Range-diff: stack/force-refresh-flag (6f4fead -> 2218270)
Reproduce locally: |
2218270 to
76e74ca
Compare
Range-diff: stack/force-refresh-flag (2218270 -> 76e74ca)
Reproduce locally: |
config/cli_token_source.go
Outdated
| args: []string{cliPath, "auth", "token", "--profile", cfg.Profile, "--force-refresh"}, | ||
| flags: []string{"--force-refresh", "--profile"}, |
There was a problem hiding this comment.
You have duplicate args and flags, is that intended?
There was a problem hiding this comment.
Note: it required me to reach line 141 to realize that this was not a bug. We should make this clearer with a better name (e.g. usedFlags) and/or a comment in cliCommand. Something like:
type cliCommand struct {
args []string
// List of flags used by the command. This is used for
// distinguishing errors caused by unknown flags from
// other errors.
usedFlags []string
warningMessage string
}
config/cli_token_source.go
Outdated
| // buildHostCommand constructs the legacy --host based CLI command. | ||
| func buildHostCommand(cliPath string, cfg *Config) []string { | ||
| cmd := []string{cliPath, "auth", "token", "--host", cfg.Host} | ||
| func appendHostCommand(commands []cliCommand, cliPath string, cfg *Config) []cliCommand { |
There was a problem hiding this comment.
Should this be inlined in buildCliCommands? It's unclear to me why some commands have their own function while others don't.
config/cli_token_source.go
Outdated
| if err != nil && c.hostCmd != nil && isUnknownFlagError(err) { | ||
| logger.Warnf(ctx, "Databricks CLI does not support --profile flag. Falling back to --host. Please upgrade your CLI to the latest version.") | ||
| return c.execCliCommand(ctx, c.hostCmd) | ||
| for i := int(c.activeCommandIndex.Load()); i < len(c.commands); i++ { |
There was a problem hiding this comment.
It feels like we're really trying to model a concurrent "binary" behavior: either we know the index and use it, or we don't and need to figure out which one to use.
In particular, I'm not sure I understand why we would want the index to change after having identified the right command. I understand that it will not in practice but the code tells a different story.
I believe this could be better modeled using sync.Once or sync.OnceValue instead of atomic.Int though we would have to be cautious about errors.
There was a problem hiding this comment.
I''m not sure how to handle transient errors with sync.Once...
Right now activeCommandIndex is set on command success. But if right now the CLI is encountering network issues for example everything might fail so activeCommandIndex is never set (nothing succeeds).
For sync.Once you could get permanently stuck with the idea that everything is failing right?
There was a problem hiding this comment.
To make the idea that the index will not change once it's resolved we could do something like this:
type CliTokenSource struct {
commands []cliCommand
activeCommandIndex atomic.Int32 // -1 = unresolved
}
// Other code
func (c *CliTokenSource) Token(ctx context.Context) (*oauth2.Token, error) {
if idx := int(c.activeCommandIndex.Load()); idx >= 0 {
return c.execCliCommand(ctx, c.commands[idx].args)
}
return c.probeAndExec(ctx)
}
func (c *CliTokenSource) probeAndExec(ctx context.Context) (*oauth2.Token, error) {
for i := 0; i < len(c.commands); i++ {
tok, err := c.execCliCommand(ctx, c.commands[i].args)
if err == nil {
c.activeCommandIndex.Store(int32(i))
return tok, nil
}
lastCommand := i == len(c.commands)-1
if lastCommand || !isUnknownFlagError(err, c.commands[i].usedFlags) {
return nil, err
}
logger.Warnf(ctx, c.commands[i].warningMessage)
}
return nil, fmt.Errorf("cannot get access token: no CLI commands configured")
}
config/cli_token_source.go
Outdated
| } | ||
| profileCmd, hostCmd := buildCliCommands(cliPath, cfg) | ||
| return &CliTokenSource{cmd: profileCmd, hostCmd: hostCmd}, nil | ||
| return &CliTokenSource{commands: buildCliCommands(cliPath, cfg)}, nil |
There was a problem hiding this comment.
If both cfg.Profile and cfg.Host are empty, buildCliCommands returns an empty slice and Token() falls through to the sentinel error. There is no test covering this path. I think either (i) NewCliTokenSource should guard against this and return an error early, or (ii) there should be a test asserting the current behavior.
Note that it is fine to return "cannot get access token: no CLI commands configured" but the condition should be treated upstream. Having it at the end of the for loop makes the code much harder to follow and reason about.
config/cli_token_source.go
Outdated
| var exitErr *exec.ExitError | ||
| if errors.As(err, &exitErr) { | ||
| return nil, fmt.Errorf("cannot get access token: %s", strings.TrimSpace(string(exitErr.Stderr))) | ||
| return nil, fmt.Errorf("cannot get access token: %q", strings.TrimSpace(string(exitErr.Stderr))) |
There was a problem hiding this comment.
[no action required] Out of curiosity, what's the reason to change %s to %q? I'm also wondering if we should try to keep wrapping ExitError type. It's a little weird that we wrap everything else. If this is intentional, let's make it clear with a comment.
There was a problem hiding this comment.
In my mind the error was coming from an external source (the CLI) so it should be quoted. I also find that easier to understand in case of empty errors. But not that relevant in this case.
Wrapping the error wouldn't hurt. The isUnknownFlagError function could also ensure that the error which is checking is actually an exit error.
There was a problem hiding this comment.
Sorry, what I meant is that we are losing the exec.ExitError type. The new type that we have introduced does not solve that.
76e74ca to
79ac794
Compare
Range-diff: stack/force-refresh-flag (76e74ca -> 79ac794)
Reproduce locally: |
config/cli_token_source.go
Outdated
| var exitErr *exec.ExitError | ||
| if errors.As(err, &exitErr) { | ||
| return nil, fmt.Errorf("cannot get access token: %s", strings.TrimSpace(string(exitErr.Stderr))) | ||
| return nil, fmt.Errorf("cannot get access token: %q", strings.TrimSpace(string(exitErr.Stderr))) |
There was a problem hiding this comment.
Sorry, what I meant is that we are losing the exec.ExitError type. The new type that we have introduced does not solve that.
| // Token fetches an OAuth token by shelling out to the Databricks CLI. | ||
| // If the working command has already been identified, it is called directly. | ||
| // Otherwise, [probeAndExec] tries each command in order to find one that works. | ||
| func (c *CliTokenSource) Token(ctx context.Context) (*oauth2.Token, error) { |
There was a problem hiding this comment.
Separate the part that is about the public API (what users will see) and the part that is about the implementation details (what the developer will see). The latter is not useful to the users, they cannot see the methods that we are referencing.
| func (c *CliTokenSource) Token(ctx context.Context) (*oauth2.Token, error) { | |
| // Token fetches an OAuth token by shelling out to the Databricks CLI. | |
| func (c *CliTokenSource) Token(ctx context.Context) (*oauth2.Token, error) { | |
| // If the working command has already been identified, it is called directly. | |
| // Otherwise, [probeAndExec] tries each command in order to find one that works. |
config/cli_token_source.go
Outdated
| return c.execCliCommand(ctx, c.hostCmd) | ||
| // probeAndExec walks the command list from most-featured to simplest, looking | ||
| // for a CLI command that succeeds. When a command fails with "unknown flag" for | ||
| // one of its [cliCommand.usedFlags], it logs a warning and tries the next. |
There was a problem hiding this comment.
| // one of its [cliCommand.usedFlags], it logs a warning and tries the next. | |
| // one of its [cliCommand.usedFlags], it logs a warning and tries the next. The "counter" is not incremented if the command fails for any other reason. |
config/cli_token_source.go
Outdated
| var exitErr *exec.ExitError | ||
| if errors.As(err, &exitErr) { | ||
| return nil, fmt.Errorf("cannot get access token: %s", strings.TrimSpace(string(exitErr.Stderr))) | ||
| return nil, fmt.Errorf("cannot get access token: %w", |
There was a problem hiding this comment.
From offline discussion: we are loosing the exec.ExitError type. I don't have an opinion on whether we should keep it or not. Though, if we don't let's be explicit about the rationale.
Pass --force-refresh to the Databricks CLI auth token command to bypass the CLI's internal token cache. The SDK manages its own token caching, so the CLI serving stale tokens from its cache causes unnecessary refresh failures. Commands are now tried in order: forceCmd (--profile + --force-refresh), profileCmd (--profile), hostCmd (--host), falling back progressively for older CLI versions that don't support newer flags. Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
Replace the three explicit command fields (forceCmd, profileCmd, hostCmd) with a []cliCommand list and an activeCommandIndex. Feature flags are defined statically in cliFeatureFlags and stripped progressively to build commands for older CLI versions. Token() now iterates from activeCommandIndex, falling back on unknown flag errors and caching the working command index for subsequent calls. Adding future flags (e.g. --scopes) requires only a one-line addition to the cliFeatureFlags slice. Signed-off-by: Mihai Mitrea <mihai.mitrea@databricks.com>
79ac794 to
a19dab0
Compare
Range-diff: stack/force-refresh-flag (79ac794 -> a19dab0)
Reproduce locally: |
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
🥞 Stacked PR
Use this link to review incremental changes.
Summary
Generalize
CliTokenSourcefrom three explicit command fields into a[]cliCommandlist with anactiveCommandIndex, so that adding future CLI flags (e.g.--scopes) requires adding onecliCommandliteral instead of a new field, a newifblock, and a new error check.Why
The parent PR (#1604) introduced
--force-refreshsupport by adding a third command field and hand-writing each fallback block inToken(). This works, but every new flag would require adding another field, anotherifblock, another error check, and another test — the pattern doesn't scale.Rather than growing the struct and
Token()linearly with each flag, this PR extracts the repeating pattern into a loop over a command list.Why try-and-retry over version detection or
--helpparsingdatabricks version+ static version table) was rejected because it creates a maintenance burden across every SDK (Go, Python, Java) and a second source of truth that can fall out of sync.--helpflag parsing was rejected because the output format is not a stable API.Token()call, each command is tried in order; when the CLI responds with"unknown flag:", the next simpler command is tried. The working command index is cached atomically so subsequent calls skip probing.What changed
Interface changes
None.
CliTokenSourceis not part of the public API surface.Behavioral changes
None. The set of commands tried is identical to the parent PR. The only observable difference is that
activeCommandIndexcaches the working command, so subsequentToken()calls skip fallback attempts — a pure performance improvement.Internal changes
cliCommandstruct: replaces the three separate[]stringfields. Each entry holdsargs(the full CLI command),usedFlags(the flags in this command, used for error matching), andwarningMessage(logged when falling back).CliTokenSource: holdscommands []cliCommandandactiveCommandIndex atomic.Int32(initialized to -1). The atomic type makes concurrent access safe when blocking and async token refreshes overlap. Once resolved, the index never changes — older CLIs don't gain new flags.buildCliCommands: constructs all command variants inline. Whencfg.Profileis set,--force-refreshis based on the--profilecommand; when onlycfg.Hostis set,--force-refreshis based on the--hostcommand. Adding a future flag means adding one morecliCommandliteral here.NewCliTokenSource: guards against empty command lists (neither profile nor host configured) with an early error.Token(): fast path whenactiveCommandIndex >= 0(resolved); otherwise delegates toprobeAndExec.probeAndExec: walks the command list. Unknown-flag errors for the command'susedFlagstrigger fallback with a warning; any other error stops probing immediately and is returned to the caller. On success, stores the index atomically.isUnknownFlagError: simplified to plainstrings.Containsmatching onerr.Error(). The previouscliErrorwrapper type was removed — stderr is formatted with%qto keep multi-line CLI output on one line in logs.exec.ExitErroris intentionally not preserved in the error chain since only the stderr text matters for error classification; exit codes and process state are not useful to callers.How is this tested?
Unit tests in
config/cli_token_source_test.go:TestBuildCliCommands— verifies the full[]cliCommandlist (args) for each config combination: host-only (2 commands), account host (2 commands), profile+host (3 commands), profile-only (2 commands).TestNewCliTokenSource/no_profile_or_host— verifies the empty-commands guard returns an error.TestCliTokenSource_Token_Fallback— table-driven test covering: force-refresh success, force→profile fallback, force→host fallback (no profile), profile→host fallback, full chain, real error stops fallback, all commands fail with unknown flag.TestCliTokenSource_Token_ActiveCommandIndexPersists— first call falls back and cachesactiveCommandIndex = 1; second call starts at index 1 and succeeds without retrying the failed command.