Centralized Pricing Configuration

Overview

The PolyAgent platform now has a centralized pricing configuration system that manages model costs across all services (Go orchestrator, Rust agent-core, Python llm-service). All pricing data is maintained in a single source of truth: config/models.yaml.

Configuration Structure

The pricing configuration is defined in config/models.yaml under the pricing section:

pricing:
  defaults:
    combined_per_1k: 0.002  # Default cost per 1K tokens when model is unknown
  models:
    <provider>:
      <model_id>:
        input_per_1k: 0.0005   # Cost per 1K input tokens
        output_per_1k: 0.0015  # Cost per 1K output tokens
        combined_per_1k: 0.002 # Optional: Used when only total tokens are known

Implementation Details

Go Orchestrator

Location: go/orchestrator/internal/pricing/pricing.go

Loads pricing configuration from config/models.yaml (or MODELS_CONFIG_PATH env var)
Provides functions:
- DefaultPerToken(): Returns default cost per token
- PricePerTokenForModel(model string): Returns model-specific cost per token
- CostForTokens(model string, tokens int): Calculates total cost
- CostForSplit(model string, inputTokens, outputTokens int): Accurate split pricing
- ModifiedTime(): Returns config file modification time
Used in:
- internal/server/service.go: Calculates workflow execution costs
- internal/activities/session.go: Updates session cost tracking

Hot reload & validation:

models.yaml is watched by the config manager; on change, pricing is reloaded
Basic validation ensures no negative values under pricing section

Fallback metrics:

polyagent_pricing_fallback_total{reason="missing_model|unknown_model"} increments whenever defaults are used (missing or unknown model name)

Rust Agent-Core

Location: rust/agent-core/src/llm_client.rs

Function pricing_cost_per_1k(model) reads pricing from same config file
Function calculate_cost(model, tokens) uses centralized pricing with fallback
Attempts to read from:
1. MODELS_CONFIG_PATH environment variable
2. /app/config/models.yaml (Docker container path)
3. ./config/models.yaml (local development)
Falls back to hardcoded heuristics if config unavailable

Python LLM Service

Location: python/llm-service/llm_provider/manager.py

LLMManager loads pricing overrides after initializing providers
Function _load_and_apply_pricing_overrides():
- Reads from config/models.yaml (or MODELS_CONFIG_PATH)
- Applies pricing to provider models matching by model_id or key
Overrides provider-specific pricing with centralized values
Maintains compatibility with existing provider configurations

Cost Calculation Logic

When input/output tokens are known separately:
- Uses input_per_1k and output_per_1k for precise calculation
When only total tokens are known:
- Uses combined_per_1k if specified
- Otherwise averages: (input_per_1k + output_per_1k) / 2
When model is unknown:
- Uses defaults.combined_per_1k
- Falls back to 0.002 per 1K tokens (GPT-3.5 equivalent)
- Increments polyagent_pricing_fallback_total

Environment Variables

MODELS_CONFIG_PATH: Override default config file location
- Used by all services (Go, Rust, Python)
- Takes precedence over default paths

Default Paths

Services look for configuration in this order:

$MODELS_CONFIG_PATH (if set)
/app/config/models.yaml (Docker containers)
./config/models.yaml (local development)

Testing

Go Testing

cd go/orchestrator
go test -v ./internal/pricing/...

Manual Verification

The implementation was verified to:

Load pricing configuration correctly from config/models.yaml
Calculate costs accurately for known models
Fall back to defaults for unknown models
Apply pricing overrides in Python LLM service

Migration Notes

What Changed

Go: Service uses pricing.CostForTokens() / CostForSplit() instead of inline heuristics
Rust: calculate_cost() checks centralized config before fallback
Python: Manager applies pricing overrides after provider initialization
Config: Added pricing section to config/models.yaml

Backward Compatibility

All services maintain fallback logic for missing configuration
Existing provider-specific pricing is preserved unless overridden
No changes to activity/proto signatures required

Workflow Coverage

For detailed information on which workflows have true per-model costs vs approximations, see workflow-pricing-coverage.md.

Summary

Production Ready (true costs): Simple, DAG v2, Supervisor, React, Streaming (single & parallel)
Using Defaults: Exploratory, Scientific patterns

Future Enhancements

Completed ✅

Split token tracking: Workflows now track input/output tokens separately
Price validation: Non-negative validation implemented
Hot-reload support: Pricing reloads on models.yaml changes
Model threading: Production workflows pass model names for accurate costs
Fallback metrics: polyagent_pricing_fallback_total tracks coverage

Potential Improvements

Cost alerts: Implement threshold notifications when costs exceed limits
Usage analytics: Track model-specific usage patterns and costs
Pattern extensions: Add per-agent tracking to Exploratory/Scientific if usage warrants

Configuration Example

Current pricing for common models:

pricing:
  defaults:
    combined_per_1k: 0.002
  models:
    openai:
      gpt-3.5-turbo:
        input_per_1k: 0.0005
        output_per_1k: 0.0015
      gpt-4-turbo:
        input_per_1k: 0.0100
        output_per_1k: 0.0300
    anthropic:
      claude-3-sonnet:
        input_per_1k: 0.0030
        output_per_1k: 0.0150
      claude-3-haiku:
        input_per_1k: 0.00025
        output_per_1k: 0.00125
    deepseek:
      deepseek-chat:
        input_per_1k: 0.0001
        output_per_1k: 0.0002

Workflow Coverage (True Cost vs. Approximate)

Streaming now reports true per‑model costs (including token splits where available).

Simple: True cost (per‑model, total tokens)
DAG v2: True cost (per‑agent, input/output split)
Supervisor: True cost (per‑agent, input/output split)
React: True cost (per‑agent, input/output split)
Streaming (single/parallel): True cost (per‑agent, total tokens; split when provided)
Exploratory (ToT/Debate/Reflect): Approximate (aggregate totals)
Scientific (CoT/Debate/ToT/Reflect): Approximate (aggregate totals)

Note: We will upgrade Exploratory/Scientific to true per‑agent costs when these patterns surface model IDs and token splits from their internal agent calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralized Pricing Configuration

Overview

Configuration Structure

Implementation Details

Go Orchestrator

Rust Agent-Core

Python LLM Service

Cost Calculation Logic

Environment Variables

Default Paths

Testing

Go Testing

Manual Verification

Migration Notes

What Changed

Backward Compatibility

Workflow Coverage

Summary

Future Enhancements

Completed ✅

Potential Improvements

Configuration Example

Workflow Coverage (True Cost vs. Approximate)

FilesExpand file tree

centralized-pricing.md

Latest commit

History

centralized-pricing.md

File metadata and controls

Centralized Pricing Configuration

Overview

Configuration Structure

Implementation Details

Go Orchestrator

Rust Agent-Core

Python LLM Service

Cost Calculation Logic

Environment Variables

Default Paths

Testing

Go Testing

Manual Verification

Migration Notes

What Changed

Backward Compatibility

Workflow Coverage

Summary

Future Enhancements

Completed ✅

Potential Improvements

Configuration Example

Workflow Coverage (True Cost vs. Approximate)