Skip to content

joy7758/token-governor

Repository files navigation

Token Governor (TianJiang)

Governance layer for runtime budget, policy, and trade-off control in AI systems.

Token Governor is the Governance Layer of the Digital Biosphere Architecture. This repository focuses on one layer of the broader architecture rather than a full agent stack, and it is intended to compose with external runtimes plus adjacent execution-integrity and audit-evidence layers.

It is not just a token-saving tool. Its core role is runtime governance: budget and policy control, routing and fallback management, and explicit trade-off handling across cost, quality, fallback behavior, and latency.

Part of the Agent Runtime Safety Kit, alongside ARO Audit and God Spear.

License Stars
Success Rate Token Savings Latency Fallback Rate

Related Projects

Architecture Context

This repository is a focused layer in the Digital Biosphere Architecture ecosystem. It does not try to be the full stack. It contributes the Governance Layer for verifiable AI systems. Its focus is runtime policy, permissions, budget-aware control, and trade-off management for agent runs, including pre-execution evaluation of intent and action requests.

What this repo is for

  • This is not a generic "save tokens" promise.
  • It focuses on runtime governance for budget windows, policy boundaries, and controlled execution trade-offs.
  • It evaluates and constrains runtime intent and action requests, not only raw prompt cost.
  • It supports routing policy, fallback control, runtime observability, and cost-quality-latency management.
  • It is suitable for agent runs, not only single-turn chat prompts.
  • It is designed to compose with audit and trust-gate tooling.

Governance Checkpoint Position

  • incoming intent and action objects can be evaluated before execution
  • policy, budget, fallback, and risk controls belong here
  • this layer sits between semantic interaction and execution integrity

For a compact example flow, see docs/governance-checkpoint.md.

Quick Demo Path

  1. Step 1: Budget window demo in this repo
  2. Step 2: Execution receipt demo in ARO Audit
  3. Step 3: Trust gate demo in God Spear

Demo Assets

External Integrations

LangChain

Use adapters/langchain_middleware.py to wrap a LangChain-style agent with token budget control, restricted tool lists, and policy violation detection.

Quickstart:

from adapters.langchain_middleware import wrap_agent

policy = {
    "token_budget": 120,
    "restricted_tools": ["write_file"],
    "persona_id": "governed-analyst",
}

governed_agent = wrap_agent(agent, policy)
result = governed_agent.run(
    {"prompt": "Search the policy memo and write it to disk."}
)

Runnable examples:

python examples/langchain_governed_agent.py
python examples/langchain_demo.py

The LangChain demo prints:

  • baseline agent output
  • governed agent output
  • policy decisions
  • token usage
  • evidence object

CrewAI

Use examples/crewai_demo.py to compare a baseline CrewAI-style workflow with the governed version.

python examples/crewai_demo.py

The CrewAI demo prints:

  • baseline agent output
  • governed agent output
  • policy decisions
  • token usage
  • evidence object

Outreach Assets


🧠 一、简介 / Introduction

中文说明:
天将 TianJiang(Token Governor)是一套面向生产环境 agent runs 的预算窗口与成本治理框架,支持路由、fallback、缓存与压缩控制,以及可复现实验报告。

English Description:
TianJiang (Token Governor) provides budget-window and cost-governance controls for agent runs, with routing, fallback, caching/compression controls, and benchmark reporting.


🚀 二、快速开始 / Quick Start

📦 Clone & Install / 克隆与安装

git clone https://github.com/joy7758/token-governor.git
cd token-governor

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

配置环境变量(任选其一):

export OPENAI_API_KEY="your_openai_key"
#
export GOOGLE_API_KEY="your_google_key"

📊 Run Baseline / 基础测试

python main.py --mode baseline --limit 20 --out-file metrics/data/baseline.jsonl

🛡️ Run Governor / 策略控制

python main.py --mode governor --drive-mode eco \
  --policy-file policy.yaml \
  --tasks-file metrics/benchmarks/benchmark_v02_60_tasks.json \
  --limit 20 \
  --out-file metrics/data/governor.jsonl

📈 Generate Report / 生成对比分析

python -m metrics.report \
  --baseline metrics/data/baseline.jsonl \
  --governor metrics/data/governor.jsonl \
  --outdir metrics/reports/compare-real \
  --interactive

🤖 CI Auto Update / GitHub Actions 自动更新

  1. 在仓库 Settings -> Secrets and variables -> Actions 配置至少一个密钥: OPENAI_API_KEYGOOGLE_API_KEY
  2. 打开 Actions,运行工作流 Benchmark And Update README Metrics
  3. 默认会执行完整 benchmark,然后自动更新 README 图表与指标并提交

✨ 三、核心功能 / Features

Feature 说明 Description
多驱动模式 Eco / Auto / Comfort / Sport / Rocket Drive modes for different cost-vs-quality trade-offs
自动策略推荐 自动分析任务并推荐最优策略 Adaptive strategy recommendation
多策略组合 缓存、压缩、路由、RAG 等 Semantic cache, prompt compression, model routing, RAG
自动报告生成 JSON / Markdown / 可视化图表 Automated comparative reporting
CLI 参数控制 丰富的命令行配置选项 Command-line interface with rich options
模型画像支持 支持 --model-profile 驱动推荐偏置 Profile-guided auto strategy hints

🧪 四、使用示例 / Usage Examples

🚗 Eco 模式(偏向低成本)

python main.py --mode governor --drive-mode eco --limit 20

🤖 Auto 模式(智能推荐)

python main.py --mode governor --drive-mode auto --auto-strategy --limit 20

🚀 Rocket 模式(高质量输出)

python main.py --mode governor --drive-mode rocket --enable-agentic-plan-cache --limit 20

📊 五、对比图与实时指标 / Metrics & Visuals

天将 TianJiang - 推理成本与 Token 节省对比图 | LLM inference cost and token savings comparison

📊 实测结果 / Real Benchmark Results

  • Token 变化 / Token Change+77.65%(Token Increase)
  • 成功率 / Success Rate:Baseline 100.00% → TianJiang (rocket) 100.00%
  • 延迟 / Latency:Baseline 6.63s → TianJiang (rocket) 10.26s (+54.62%)
  • 总 Token / Total Tokens:Baseline 935 → TianJiang (rocket) 1,661 (+77.65%)
  • 统计口径 / Method:Total Tokens = count × mean_token(input+output)

数据源 / Data source: metrics/reports/compare-real-check/comparison.json | Generated (UTC): 2026-03-03T13:55:47.575675+00:00 | ΔSuccess: +0.00pp

关键词 / Keywords:天将, TianJiang, Token Governor, LLM 成本优化, Token 节省, 推理成本, 智能体, 上下文压缩, 语义缓存, 工具 Top-K, 预算守卫, 自动策略, 模型画像, 推理路由, LLM cost optimization, token savings, inference cost, AI agents, context compression, semantic cache, tool top-k, budget guard, auto strategy, model profiling, model routing


📍 六、参数说明 / CLI Reference

参数 / Parameter 说明 / Description 默认值 / Default
--mode 运行模式:baseline / governor baseline
--drive-mode 驾驶模式:auto/eco/comfort/sport/rocket None
--opt-strategy 手动策略:light/balanced/knowledge/enterprise balanced
--auto-strategy 启用自动策略推荐 False
--limit 任务数量限制 None(全部默认任务)
--model 模型选择(如 auto, openai:gpt-4o-mini auto
--max-tokens Governor 每任务累计 token 预算 12000
--max-fallback Governor 最大 fallback 次数 2
--out-file 结果 JSONL 输出路径 None
--model-profile 模型画像 JSON 路径 None
--policy-file Policy YAML 路径(v0.2 gate/fallback/risk 配置) policy.yaml
--tasks-file 任务集文件(.json / .jsonl None(使用内置任务)

🧪 七、Benchmark v0.2

  • 任务集(60 条):metrics/benchmarks/benchmark_v02_60_tasks.json
  • JSONL 版本:metrics/benchmarks/benchmark_v02_60_tasks.jsonl
  • 类别分布:5 类 × 12 条(单轮无工具 / 单工具敏感 / 多工具串联 / 长历史 / 对抗安全)

运行示例:

python main.py --mode governor \
  --policy-file policy.yaml \
  --tasks-file metrics/benchmarks/benchmark_v02_60_tasks.json \
  --out-file metrics/data/governor-v02.jsonl

自动判定示例:

python -m metrics.validator \
  --tasks metrics/benchmarks/benchmark_v02_60_tasks.json \
  --records metrics/data/governor-v02.jsonl \
  --out metrics/reports/validator-v02.json

📊 八、Dashboard

生成可视化 Dashboard(帕累托、分类柱状、失败分布、压缩率关系):

python -m metrics.dashboard.benchmark_dashboard \
  --governor metrics/data/governor-v02.jsonl \
  --baseline metrics/data/baseline-v02.jsonl \
  --outdir metrics/reports/v02-dashboard

输出目录默认包含:

  • pareto_scatter.html/png
  • category_bars.html/png
  • failure_pie.html/png
  • compression_success.html/png
  • summary_panel.png
  • category_summary.csv
  • overall_summary.csv
  • dashboard_summary.json

🔁 九、CI 自动化

新增工作流:.github/workflows/benchmark-v02-dashboard-auto.yml

  • 触发:push main + workflow_dispatch
  • 自动执行:baseline benchmark → governor benchmark → validator → dashboard → comparison report → README metrics 更新 → 自动提交

新增轻量定时工作流:.github/workflows/benchmark-v02-daily-light.yml

  • 触发:daily cron + workflow_dispatch
  • 默认只跑 limit=20 轻量任务
  • 自动执行 guardrail 检查(成功率跌幅 / token 增幅 / 延迟增幅阈值)
  • guardrail 失败时自动创建(同日去重)issue,并可 @维护者,最终标记 job failed
  • 自动生成 docs/trends/*.jsondocs/badges/*.svgdocs/trends/kpi_summary.md
  • 可选通知脚本:scripts/notify_slack.pyscripts/notify_dingtalk.pyscripts/notify_email.py
  • 趋势页面发布工作流:.github/workflows/publish-trends-pages.yml

本地一键执行同等流程:

bash scripts/run-benchmark-v02-dashboard.sh

🧠 十、典型场景 / Use Cases

中文:

  • 企业级 LLM 推理成本优化
  • 多模型推理策略治理
  • Agent 智能体推理优化
  • 自动化对比实验与报告

English:

  • Enterprise LLM inference cost control
  • Multi-model strategy governance
  • Agent runtime optimization
  • Automated benchmarking and reporting

❓ 十一、常见问题 / FAQ

Q1: 什么是 Drive Mode?
A: Drive Mode 用于在成本与质量之间做权衡,例如 Eco 模式偏向较低成本,而 Rocket 模式偏向输出质量。

Q2: Auto 和 Comfort 有何区别?
A: Auto 是任务特征驱动的动态推荐路径;Comfort 是固定平衡型档位。

Q3: 自动模式会覆盖手动参数吗?
A: 不会。显式传入的 CLI 参数优先级更高。

Q4: Auto 模式一定会进入 Rocket 吗?
A: 不会。Auto 会根据任务特征与预算约束动态选档,可能停在 Eco/Comfort/Sport。

Q5: Rocket 模式一定更省钱吗?
A: 不一定。Rocket 优先保证输出能力,不承诺 Token 成本最低;请以实测对比区块为准。

Q6: 如何一键复现实验并更新 README?
A: 多模式对比可执行 bash scripts/run-all-and-update.sh;v0.2 benchmark + dashboard 可执行 bash scripts/run-benchmark-v02-dashboard.sh;远端可运行 Benchmark v0.2 Dashboard AutoBenchmark And Update README Metrics


👥 十二、贡献指南 / Contributing

欢迎提交 Issue 和 Pull Request,细则见 CONTRIBUTING.md


📜 十三、许可证 / License

本项目使用 TianJiang Non-Commercial License v1.0

  • 非商用可免费使用
  • 商用需要先购买授权或建立合作协议
    详见 LICENSE

📚 十四、参考 / References