Add API key rotation support for OpenAI-compatible providers#75
Conversation
d07a2ac to
75338b1
Compare
|
@saivedant169 ready for review . |
|
Thanks for this. Pulled it down and ran everything. Build works, the full test suite passes clean with -race -count=1, and the 9 KeyRotator tests cover the cases I'd want: round-robin distribution, 401 permanent fail, 429 cooldown plus auto-recovery, all-keys-failed, all-keys-rate-limited, single key, empty rotator, and the empty-string filter. The design is the right call. Backward compatible through the api_key_env fallback in buildKeyRotator, thread-safe with sync.Mutex plus atomic.Uint64, and markKeyFromError wires 401 to permanent fail and 429 to cooldown exactly how I'd want it. No new dependencies either, which matters to me for this repo. One thing: your branch is behind main. We've added a bunch of new config fields lately (MCP gateway, evidence chain, credential broker, etc.) so you'll hit minor conflicts in internal/config/config.go and cmd/aegisflow/main.go. Should be mechanical though, your changes to ProviderConfig are additive. Can you rebase on latest main and push? Once you do, I'll merge. |
- Add KeyRotator with round-robin/random strategy - Auto-exclude keys on 401 (permanent) and 429 (cooldown-based) - Add api_keys list to ProviderConfig for multi-key YAML config - Backward compatible with existing api_key_env field Closes saivedant169#20 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
75338b1 to
0f0c65c
Compare
|
I have rebased on the latest main and pushed the code |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
couple things before merging. the 429 retry is fighting the rotation — when a key hits 429, doRequest just retries the same key before we mark it, so we waste the retry budget on a key we already know is dead. either skip the retry on 429, or have the loop call Pick() again between attempts. i'd go with the second. and Pick itself has a subtle bug. r.counter.Add(1) % uint64(len(active)) keeps climbing while active changes length, so picks drift and keys get skipped occasionally. Pick is already under r.mu too, so the atomic isn't earning anything — just a plain uint64 would do. small stuff: key_selection: random is in the config comment but only round-robin exists. 60s cooldown is hardcoded, worth a TODO. fix those two and i'm happy. |
- KeyRotator.Pick: replace atomic.Uint64 with plain uint64 under the existing mutex; counter % len(active) now operates on a stable snapshot so keys are no longer skipped when the active-set shrinks - doRequest: mark the current key (MarkFailed/MarkRateLimited) and call Pick() for a fresh key before each retry attempt, so a 429-ed key is not immediately retried and the retry budget is not wasted - Remove atomic import (no longer needed) - Config comment: clarify key_selection only supports round-robin - NewKeyRotator doc: add TODO for configurable cooldown duration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed both issues — pushed to the branch. Pick counter drift — replaced atomic.Uint64 with a plain uint64 under the existing r.mu. Since Pick already holds the lock, the atomic bought nothing, and the pre-increment (Add(1) before mod) meant the index drifted as the active-set shrank. Now it's idx = r.counter % len(active); r.counter++ — stable and correct. 429 retry reusing dead key — doRequest now calls markKeyFromError + Pick() before sleeping between retries, so the 429-ed key is marked and a fresh key is used on the next attempt. If no other key is available the loop still proceeds (same key) and will hit the same error, which is the right behaviour. Also removed "random" from the key_selection config comment and added a TODO on the hardcoded 60s cooldown. |
Closes #20
What this does
Adds multi-key rotation for OpenAI-compatible providers so a single
rate-limited or revoked key doesn't take down the whole provider.
Changes
internal/provider/keyrotator.go— NewKeyRotatorwith round-robin/randomstrategy. Keys auto-excluded on 401 (permanently) and 429 (configurable
cooldown, default 60s, auto-recovered after cooldown).
internal/provider/openai.go— Replaces singleapiKey stringwith*KeyRotator.ChatCompletionandChatCompletionStreamcallMarkFailedon 401 and
MarkRateLimitedon 429.internal/config/config.go— AddsProviderAPIKeystruct andapi_keys/key_selectionfields toProviderConfig.cmd/aegisflow/main.go—buildKeyRotator()resolves keys from the newlist or falls back to legacy
api_key_env.Config example