Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Local AI API gateway for OpenAI / Gemini / Anthropic. Runs on your machine, keep
## Quick start (macOS)
1) Install: move `Token Proxy.app` to `/Applications`. If blocked: `xattr -cr /Applications/Token\ Proxy.app`.
2) Launch the app. The proxy starts automatically.
3) Open **Config File** tab, edit and save (writes `config.jsonc` in the Tauri config dir). Defaults are usable; just paste your upstream API keys.
3) Open **Config File** tab, edit and save (writes `config.jsonc` in the Tauri config dir). Defaults are usable; just paste your upstream API keys. Running proxies auto-apply the new config via reload or restart when needed.
4) Call via curl (example with local auth):
```bash
curl -X POST \
Expand Down Expand Up @@ -100,6 +100,7 @@ Notes:
| `app_proxy_url` | `null` | Proxy for app updater & as placeholder for upstreams (`"$app_proxy_url"`). Supports `http/https/socks5/socks5h`. |
| `log_level` | `silent` | `silent|error|warn|info|debug|trace`; debug/trace log request headers (auth redacted) and small bodies (≤64KiB). Release builds force `silent`. |
| `max_request_body_bytes` | `20971520` (20 MiB) | 0 = fallback to default. Protects inbound body size. |
| `retryable_failure_cooldown_secs` | `15` | Cooldown window after retryable failures that should temporarily sideline an upstream. `0` disables cooldown. Reloading or restarting the running proxy resets current cooldown state. |
| `tray_token_rate.enabled` | `true` | macOS tray live rate; harmless elsewhere. |
| `tray_token_rate.format` | `split` | `combined` (`total`), `split` (`↑in ↓out`), `both` (`total | ↑in ↓out`). |
| `upstream_strategy` | `priority_fill_first` | `priority_fill_first` (default) keeps trying the highest-priority group in list order; `priority_round_robin` rotates within each priority group. |
Expand Down Expand Up @@ -147,7 +148,8 @@ Notes:

## Load balancing & retries
- Priorities: higher `priority` groups first; inside a group use list order (fill-first) or round-robin (if `priority_round_robin`).
- Retryable conditions: network timeout/connect errors, or status 400/403/429/307/5xx **except** 504/524. Retries stay within the same provider's priority groups.
- Retryable conditions: network timeout/connect errors, or status 400/401/403/404/408/422/429/307/5xx (including 504/524). Retries stay within the same provider's priority groups.
- Cooldown conditions: `401/403/408/429/5xx` will temporarily move the failed upstream behind ready peers for `retryable_failure_cooldown_secs` (default `15`); `400/404/422/307` stay retryable but do not trigger cross-request cooldown.
- `/v1/messages` only: after the chosen native provider is exhausted (retryable errors), the proxy can fall back to the other native provider (`anthropic` ↔ `kiro`) if it is configured.

## Observability
Expand Down
6 changes: 4 additions & 2 deletions README.zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
## 快速上手(macOS)
1) 安装:把 `Token Proxy.app` 放到 `/Applications`。若被拦截,执行 `xattr -cr /Applications/Token\ Proxy.app`。
2) 启动应用,代理会自动运行。
3) 打开 **Config File** 标签,编辑并保存(写入 Tauri 配置目录下的 `config.jsonc`)。默认配置可用,只需填入上游 API Key。
3) 打开 **Config File** 标签,编辑并保存(写入 Tauri 配置目录下的 `config.jsonc`)。默认配置可用,只需填入上游 API Key。若代理正在运行,保存后会按需自动 reload 或重启。
4) 发请求(本地鉴权示例):
```bash
curl -X POST \
Expand Down Expand Up @@ -100,6 +100,7 @@ pnpm exec tsc --noEmit
| `app_proxy_url` | `null` | 应用更新 & 上游可复用的代理;支持 `http/https/socks5/socks5h`;可在 upstream `proxy_url` 用 `"$app_proxy_url"` 占位 |
| `log_level` | `silent` | `silent|error|warn|info|debug|trace`;debug/trace 会记录请求头(鉴权打码)与小体积请求体(≤64KiB);release 强制 `silent` |
| `max_request_body_bytes` | `20971520` (20 MiB) | 0 表示回落到默认;保护入站体积 |
| `retryable_failure_cooldown_secs` | `15` | 对适合短时降级的可重试失败施加冷却窗口;`0` 表示关闭冷却。重载或重启运行中的代理会重置当前冷却状态 |
| `tray_token_rate.enabled` | `true` | macOS 托盘实时速率;其他平台无害 |
| `tray_token_rate.format` | `split` | `combined`(总数) / `split`(↑入 ↓出) / `both`(总数 | ↑入 ↓出) |
| `upstream_strategy` | `priority_fill_first` | `priority_fill_first` 默认先填满高优先级;`priority_round_robin` 在同组内轮询 |
Expand Down Expand Up @@ -147,7 +148,8 @@ pnpm exec tsc --noEmit

## 负载均衡与重试
- 优先级:高优先级组先尝试;组内按列表顺序(fill-first)或轮询(round-robin)
- 可重试条件:网络超时/连接错误,或状态码 400/403/429/307/5xx(排除 504/524);重试只在同一 provider 的优先级组内进行
- 可重试条件:网络超时/连接错误,或状态码 400/401/403/404/408/422/429/307/5xx(包含 504/524);重试只在同一 provider 的优先级组内进行
- 冷却条件:`401/403/408/429/5xx` 会让失败 upstream 在 `retryable_failure_cooldown_secs`(默认 `15`)内被暂时后置;`400/404/422/307` 仍可重试,但不会触发跨请求冷却
- 仅 `/v1/messages`:当命中的 native provider(`anthropic`/`kiro`)被耗尽(仍是可重试错误)时,若另一个 native provider 已配置,会自动 fallback(Anthropic ↔ Kiro)

## 可观测性
Expand Down
16 changes: 16 additions & 0 deletions crates/token_proxy_core/src/proxy/config/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ mod normalize;
mod types;

use crate::paths::TokenProxyPaths;
use std::time::{Duration, Instant};

const DEFAULT_MAX_REQUEST_BODY_BYTES: u64 = 20 * 1024 * 1024;

Expand Down Expand Up @@ -67,13 +68,24 @@ fn build_runtime_config(config: ProxyConfigFile) -> Result<ProxyConfig, String>
local_api_key: config.local_api_key,
log_level,
max_request_body_bytes,
retryable_failure_cooldown: resolve_retryable_failure_cooldown(
config.retryable_failure_cooldown_secs,
)?,
upstream_strategy: config.upstream_strategy,
upstreams,
kiro_preferred_endpoint: config.kiro_preferred_endpoint,
antigravity_user_agent: config.antigravity_user_agent,
})
}

fn resolve_retryable_failure_cooldown(value: u64) -> Result<Duration, String> {
let duration = Duration::from_secs(value);
if Instant::now().checked_add(duration).is_none() {
return Err("retryable_failure_cooldown_secs is too large.".to_string());
}
Ok(duration)
}

fn resolve_max_request_body_bytes(value: Option<u64>) -> usize {
let value = value.unwrap_or(DEFAULT_MAX_REQUEST_BODY_BYTES);
let value = if value == 0 {
Expand All @@ -96,3 +108,7 @@ fn normalize_app_proxy_url(value: Option<&str>) -> Result<Option<String>, String
scheme => Err(format!("app_proxy_url scheme is not supported: {scheme}.")),
}
}

#[cfg(test)]
#[path = "mod.test.rs"]
mod tests;
11 changes: 11 additions & 0 deletions crates/token_proxy_core/src/proxy/config/mod.test.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
use super::*;

#[test]
fn build_runtime_config_rejects_retryable_failure_cooldown_that_overflows_instant() {
let mut config = ProxyConfigFile::default();
config.retryable_failure_cooldown_secs = u64::MAX;

let result = build_runtime_config(config);

assert!(result.is_err());
}
15 changes: 15 additions & 0 deletions crates/token_proxy_core/src/proxy/config/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ fn default_log_level() -> LogLevel {
LogLevel::Silent
}

fn default_retryable_failure_cooldown_secs() -> u64 {
15
}

fn is_default_retryable_failure_cooldown_secs(value: &u64) -> bool {
*value == default_retryable_failure_cooldown_secs()
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum InboundApiFormat {
Expand Down Expand Up @@ -182,6 +190,11 @@ pub struct ProxyConfigFile {
pub log_level: LogLevel,
#[serde(skip_serializing_if = "Option::is_none")]
pub max_request_body_bytes: Option<u64>,
#[serde(
default = "default_retryable_failure_cooldown_secs",
skip_serializing_if = "is_default_retryable_failure_cooldown_secs"
)]
pub retryable_failure_cooldown_secs: u64,
#[serde(default)]
pub tray_token_rate: TrayTokenRateConfig,
#[serde(default)]
Expand All @@ -204,6 +217,7 @@ impl Default for ProxyConfigFile {
antigravity_user_agent: None,
log_level: LogLevel::default(),
max_request_body_bytes: None,
retryable_failure_cooldown_secs: default_retryable_failure_cooldown_secs(),
tray_token_rate: TrayTokenRateConfig::default(),
upstream_strategy: UpstreamStrategy::PriorityFillFirst,
upstreams: Vec::new(),
Expand All @@ -218,6 +232,7 @@ pub struct ProxyConfig {
pub local_api_key: Option<String>,
pub log_level: LogLevel,
pub max_request_body_bytes: usize,
pub retryable_failure_cooldown: std::time::Duration,
pub upstream_strategy: UpstreamStrategy,
pub upstreams: HashMap<String, ProviderUpstreams>,
pub kiro_preferred_endpoint: Option<KiroPreferredEndpoint>,
Expand Down
7 changes: 7 additions & 0 deletions crates/token_proxy_core/src/proxy/config/types.test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,10 @@ fn test_upstream_url() {
"https://api.example.com/openai/v1/messages"
);
}

#[test]
fn proxy_config_file_defaults_retryable_failure_cooldown_to_15_seconds() {
let config = ProxyConfigFile::default();

assert_eq!(config.retryable_failure_cooldown_secs, 15);
}
2 changes: 2 additions & 0 deletions crates/token_proxy_core/src/proxy/http.test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ fn config_with_local(key: &str) -> ProxyConfig {
local_api_key: Some(key.to_string()),
log_level: LogLevel::Silent,
max_request_body_bytes: 1024,
retryable_failure_cooldown: std::time::Duration::from_secs(15),
upstream_strategy: crate::proxy::config::UpstreamStrategy::PriorityFillFirst,
upstreams: HashMap::new(),
kiro_preferred_endpoint: None,
Expand All @@ -24,6 +25,7 @@ fn config_without_local() -> ProxyConfig {
local_api_key: None,
log_level: LogLevel::Silent,
max_request_body_bytes: 1024,
retryable_failure_cooldown: std::time::Duration::from_secs(15),
upstream_strategy: crate::proxy::config::UpstreamStrategy::PriorityFillFirst,
upstreams: HashMap::new(),
kiro_preferred_endpoint: None,
Expand Down
Loading