The PolyAgent platform now has a centralized pricing configuration system that manages model costs across all services (Go orchestrator, Rust agent-core, Python llm-service). All pricing data is maintained in a single source of truth: config/models.yaml.
The pricing configuration is defined in config/models.yaml under the pricing section:
pricing:
defaults:
combined_per_1k: 0.002 # Default cost per 1K tokens when model is unknown
models:
<provider>:
<model_id>:
input_per_1k: 0.0005 # Cost per 1K input tokens
output_per_1k: 0.0015 # Cost per 1K output tokens
combined_per_1k: 0.002 # Optional: Used when only total tokens are knownLocation: go/orchestrator/internal/pricing/pricing.go
- Loads pricing configuration from
config/models.yaml(orMODELS_CONFIG_PATHenv var) - Provides functions:
DefaultPerToken(): Returns default cost per tokenPricePerTokenForModel(model string): Returns model-specific cost per tokenCostForTokens(model string, tokens int): Calculates total costCostForSplit(model string, inputTokens, outputTokens int): Accurate split pricingModifiedTime(): Returns config file modification time
- Used in:
internal/server/service.go: Calculates workflow execution costsinternal/activities/session.go: Updates session cost tracking
Hot reload & validation:
models.yamlis watched by the config manager; on change, pricing is reloaded- Basic validation ensures no negative values under
pricingsection
Fallback metrics:
polyagent_pricing_fallback_total{reason="missing_model|unknown_model"}increments whenever defaults are used (missing or unknown model name)
Location: rust/agent-core/src/llm_client.rs
- Function
pricing_cost_per_1k(model)reads pricing from same config file - Function
calculate_cost(model, tokens)uses centralized pricing with fallback - Attempts to read from:
MODELS_CONFIG_PATHenvironment variable/app/config/models.yaml(Docker container path)./config/models.yaml(local development)
- Falls back to hardcoded heuristics if config unavailable
Location: python/llm-service/llm_provider/manager.py
LLMManagerloads pricing overrides after initializing providers- Function
_load_and_apply_pricing_overrides():- Reads from
config/models.yaml(orMODELS_CONFIG_PATH) - Applies pricing to provider models matching by model_id or key
- Reads from
- Overrides provider-specific pricing with centralized values
- Maintains compatibility with existing provider configurations
-
When input/output tokens are known separately:
- Uses
input_per_1kandoutput_per_1kfor precise calculation
- Uses
-
When only total tokens are known:
- Uses
combined_per_1kif specified - Otherwise averages:
(input_per_1k + output_per_1k) / 2
- Uses
-
When model is unknown:
- Uses
defaults.combined_per_1k - Falls back to
0.002per 1K tokens (GPT-3.5 equivalent) - Increments
polyagent_pricing_fallback_total
- Uses
MODELS_CONFIG_PATH: Override default config file location- Used by all services (Go, Rust, Python)
- Takes precedence over default paths
Services look for configuration in this order:
$MODELS_CONFIG_PATH(if set)/app/config/models.yaml(Docker containers)./config/models.yaml(local development)
cd go/orchestrator
go test -v ./internal/pricing/...The implementation was verified to:
- Load pricing configuration correctly from
config/models.yaml - Calculate costs accurately for known models
- Fall back to defaults for unknown models
- Apply pricing overrides in Python LLM service
- Go: Service uses
pricing.CostForTokens()/CostForSplit()instead of inline heuristics - Rust:
calculate_cost()checks centralized config before fallback - Python: Manager applies pricing overrides after provider initialization
- Config: Added
pricingsection toconfig/models.yaml
- All services maintain fallback logic for missing configuration
- Existing provider-specific pricing is preserved unless overridden
- No changes to activity/proto signatures required
For detailed information on which workflows have true per-model costs vs approximations, see workflow-pricing-coverage.md.
- Production Ready (true costs): Simple, DAG v2, Supervisor, React, Streaming (single & parallel)
- Using Defaults: Exploratory, Scientific patterns
- Split token tracking: Workflows now track input/output tokens separately
- Price validation: Non-negative validation implemented
- Hot-reload support: Pricing reloads on
models.yamlchanges - Model threading: Production workflows pass model names for accurate costs
- Fallback metrics:
polyagent_pricing_fallback_totaltracks coverage
- Cost alerts: Implement threshold notifications when costs exceed limits
- Usage analytics: Track model-specific usage patterns and costs
- Pattern extensions: Add per-agent tracking to Exploratory/Scientific if usage warrants
Current pricing for common models:
pricing:
defaults:
combined_per_1k: 0.002
models:
openai:
gpt-3.5-turbo:
input_per_1k: 0.0005
output_per_1k: 0.0015
gpt-4-turbo:
input_per_1k: 0.0100
output_per_1k: 0.0300
anthropic:
claude-3-sonnet:
input_per_1k: 0.0030
output_per_1k: 0.0150
claude-3-haiku:
input_per_1k: 0.00025
output_per_1k: 0.00125
deepseek:
deepseek-chat:
input_per_1k: 0.0001
output_per_1k: 0.0002Streaming now reports true per‑model costs (including token splits where available).
- Simple: True cost (per‑model, total tokens)
- DAG v2: True cost (per‑agent, input/output split)
- Supervisor: True cost (per‑agent, input/output split)
- React: True cost (per‑agent, input/output split)
- Streaming (single/parallel): True cost (per‑agent, total tokens; split when provided)
- Exploratory (ToT/Debate/Reflect): Approximate (aggregate totals)
- Scientific (CoT/Debate/ToT/Reflect): Approximate (aggregate totals)
Note: We will upgrade Exploratory/Scientific to true per‑agent costs when these patterns surface model IDs and token splits from their internal agent calls.