Skip to content

Add API key rotation support for OpenAI-compatible providers#75

Merged
saivedant169 merged 2 commits intosaivedant169:mainfrom
rk-python5:feature/api-key-rotation
Apr 10, 2026
Merged

Add API key rotation support for OpenAI-compatible providers#75
saivedant169 merged 2 commits intosaivedant169:mainfrom
rk-python5:feature/api-key-rotation

Conversation

@rk-python5
Copy link
Copy Markdown
Contributor

Closes #20

What this does

Adds multi-key rotation for OpenAI-compatible providers so a single
rate-limited or revoked key doesn't take down the whole provider.

Changes

  • internal/provider/keyrotator.go — New KeyRotator with round-robin/random
    strategy. Keys auto-excluded on 401 (permanently) and 429 (configurable
    cooldown, default 60s, auto-recovered after cooldown).
  • internal/provider/openai.go — Replaces single apiKey string with
    *KeyRotator. ChatCompletion and ChatCompletionStream call MarkFailed
    on 401 and MarkRateLimited on 429.
  • internal/config/config.go — Adds ProviderAPIKey struct and api_keys /
    key_selection fields to ProviderConfig.
  • cmd/aegisflow/main.gobuildKeyRotator() resolves keys from the new
    list or falls back to legacy api_key_env.
  • 9 rotator unit tests + 3 new OpenAI integration tests.

Config example

providers:
  - name: openai
    type: openai
    key_selection: round-robin
    api_keys:
      - key_env: OPENAI_KEY_1
      - key_env: OPENAI_KEY_2

@rk-python5 rk-python5 force-pushed the feature/api-key-rotation branch from d07a2ac to 75338b1 Compare April 6, 2026 18:13
@rk-python5
Copy link
Copy Markdown
Contributor Author

@saivedant169 ready for review .

@saivedant169
Copy link
Copy Markdown
Owner

Thanks for this.

Pulled it down and ran everything. Build works, the full test suite passes clean with -race -count=1, and the 9 KeyRotator tests cover the cases I'd want: round-robin distribution, 401 permanent fail, 429 cooldown plus auto-recovery, all-keys-failed, all-keys-rate-limited, single key, empty rotator, and the empty-string filter.

The design is the right call. Backward compatible through the api_key_env fallback in buildKeyRotator, thread-safe with sync.Mutex plus atomic.Uint64, and markKeyFromError wires 401 to permanent fail and 429 to cooldown exactly how I'd want it. No new dependencies either, which matters to me for this repo.

One thing: your branch is behind main. We've added a bunch of new config fields lately (MCP gateway, evidence chain, credential broker, etc.) so you'll hit minor conflicts in internal/config/config.go and cmd/aegisflow/main.go. Should be mechanical though, your changes to ProviderConfig are additive.

Can you rebase on latest main and push? Once you do, I'll merge.

- Add KeyRotator with round-robin/random strategy
- Auto-exclude keys on 401 (permanent) and 429 (cooldown-based)
- Add api_keys list to ProviderConfig for multi-key YAML config
- Backward compatible with existing api_key_env field

Closes saivedant169#20

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rk-python5 rk-python5 force-pushed the feature/api-key-rotation branch from 75338b1 to 0f0c65c Compare April 9, 2026 16:50
@rk-python5
Copy link
Copy Markdown
Contributor Author

I have rebased on the latest main and pushed the code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 71.42857% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/aegisflow/main.go 0.00% 15 Missing ⚠️
internal/provider/openai.go 65.11% 12 Missing and 3 partials ⚠️
internal/provider/keyrotator.go 96.29% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@saivedant169
Copy link
Copy Markdown
Owner

couple things before merging.

the 429 retry is fighting the rotation — when a key hits 429, doRequest just retries the same key before we mark it, so we waste the retry budget on a key we already know is dead. either skip the retry on 429, or have the loop call Pick() again between attempts. i'd go with the second.

and Pick itself has a subtle bug. r.counter.Add(1) % uint64(len(active)) keeps climbing while active changes length, so picks drift and keys get skipped occasionally. Pick is already under r.mu too, so the atomic isn't earning anything — just a plain uint64 would do.

small stuff: key_selection: random is in the config comment but only round-robin exists. 60s cooldown is hardcoded, worth a TODO.

fix those two and i'm happy.

- KeyRotator.Pick: replace atomic.Uint64 with plain uint64 under the
  existing mutex; counter % len(active) now operates on a stable
  snapshot so keys are no longer skipped when the active-set shrinks
- doRequest: mark the current key (MarkFailed/MarkRateLimited) and
  call Pick() for a fresh key before each retry attempt, so a 429-ed
  key is not immediately retried and the retry budget is not wasted
- Remove atomic import (no longer needed)
- Config comment: clarify key_selection only supports round-robin
- NewKeyRotator doc: add TODO for configurable cooldown duration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rk-python5
Copy link
Copy Markdown
Contributor Author

Fixed both issues — pushed to the branch.

Pick counter drift — replaced atomic.Uint64 with a plain uint64 under the existing r.mu. Since Pick already holds the lock, the atomic bought nothing, and the pre-increment (Add(1) before mod) meant the index drifted as the active-set shrank. Now it's idx = r.counter % len(active); r.counter++ — stable and correct.

429 retry reusing dead key — doRequest now calls markKeyFromError + Pick() before sleeping between retries, so the 429-ed key is marked and a fresh key is used on the next attempt. If no other key is available the loop still proceeds (same key) and will hit the same error, which is the right behaviour.

Also removed "random" from the key_selection config comment and added a TODO on the hardcoded 60s cooldown.

@saivedant169 saivedant169 merged commit ab330d8 into saivedant169:main Apr 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add API key rotation support

3 participants