Skip to content

Add support for audio token billing (Whisper, TTS, voice models) #856

@nickna

Description

@nickna

Problem

As voice and audio models become more prevalent, the system needs to support audio-specific token types and billing. Current system only tracks text tokens, missing audio input/output tokens that have different pricing.

Current State

  • No audio_tokens field in Usage model
  • No extraction for audio-specific usage data
  • No cost fields for audio token rates
  • Audio transcription/speech tracked as regular tokens (if at all)

Future Audio Models to Support

OpenAI Audio

  • Whisper (transcription): Charges per minute of audio
  • TTS (text-to-speech): Charges per character generated
  • GPT-4o Audio (future): Native audio in/out with specific token rates

Other Providers

  • ElevenLabs: Per character or per minute pricing
  • Anthropic Claude Audio (future): Expected audio token support
  • Google Gemini Audio: Already supports audio with different rates

Technical Requirements

1. Update Usage Model

// ConduitLLM.Core/Models/Usage.cs
/// <summary>
/// Number of audio input tokens (for models processing audio).
/// </summary>
[JsonPropertyName("audio_input_tokens")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)]
public int? AudioInputTokens { get; set; }

/// <summary>
/// Number of audio output tokens (for models generating audio).
/// </summary>
[JsonPropertyName("audio_output_tokens")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)]
public int? AudioOutputTokens { get; set; }

/// <summary>
/// Duration of audio processed/generated in seconds.
/// </summary>
[JsonPropertyName("audio_duration_seconds")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)]
public double? AudioDurationSeconds { get; set; }

/// <summary>
/// Number of characters for TTS generation.
/// </summary>
[JsonPropertyName("tts_characters")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingNull)]
public int? TtsCharacters { get; set; }

2. Update UsageExtractor

// Handle OpenAI Whisper/TTS format
if (usageElement.TryGetProperty("audio_seconds", out var audioSeconds))
    usage.AudioDurationSeconds = audioSeconds.GetDouble();

// Handle audio token formats
if (usageElement.TryGetProperty("audio_input_tokens", out var audioInput))
    usage.AudioInputTokens = audioInput.GetInt32();

if (usageElement.TryGetProperty("audio_output_tokens", out var audioOutput))
    usage.AudioOutputTokens = audioOutput.GetInt32();

3. Update ModelCost Entity

/// <summary>
/// Cost per million audio input tokens.
/// </summary>
[Column(TypeName = "decimal(18, 10)")]
public decimal? AudioInputCostPerMillionTokens { get; set; }

/// <summary>
/// Cost per million audio output tokens.
/// </summary>
[Column(TypeName = "decimal(18, 10)")]
public decimal? AudioOutputCostPerMillionTokens { get; set; }

/// <summary>
/// Cost per minute of audio (Whisper-style pricing).
/// </summary>
[Column(TypeName = "decimal(18, 10)")]
public decimal? AudioCostPerMinute { get; set; }

/// <summary>
/// Cost per 1000 characters (TTS-style pricing).
/// </summary>
[Column(TypeName = "decimal(18, 10)")]
public decimal? TtsCostPerThousandCharacters { get; set; }

4. Update Cost Calculation

Handle different audio pricing models:

  • Per token (GPT-4o audio)
  • Per minute (Whisper)
  • Per character (TTS)

5. Update PricingModel Enum

public enum PricingModel
{
    Standard = 1,
    // ... existing ...
    AudioPerMinute = 10,
    AudioPerCharacter = 11,
    AudioTokenBased = 12
}

Example Pricing

OpenAI Whisper

  • $0.006 per minute of audio

OpenAI TTS

  • TTS: $15 per 1M characters
  • TTS HD: $30 per 1M characters

Future GPT-4o Audio (speculative)

  • Audio input: Different rate than text input
  • Audio output: Different rate than text output

Impact

  • Future Revenue: Will miss audio billing when these models are added
  • Affected Models: Whisper, TTS, future multimodal models with audio
  • Severity: Low (future need, not current)

Testing Requirements

  • Unit tests for audio token extraction
  • Cost calculation tests for different audio pricing models
  • Integration tests with mock audio API responses
  • Refund calculation tests including audio tokens

Priority

Low - Future-proofing for when audio models are added to the system. Not causing current revenue loss.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions