Caching

Reduce costs and latency with intelligent HTTP-level response caching. PHP LLM caches LLM responses automatically, making repeated requests nearly instant and free.

Overview

All PHP LLM clients support caching at the HTTP request level. When enabled:

Identical requests return cached responses instantly
No API calls are made for cached requests
Original response time is preserved in metadata
Costs are eliminated for cached responses
Streaming and non-streaming requests share the same cache entries

File Cache

The built-in FileCache stores responses on the filesystem:

<?php
use Soukicz\Llm\Cache\FileCache;
use Soukicz\Llm\Client\Anthropic\AnthropicClient;

$cache = new FileCache(sys_get_temp_dir());
$client = new AnthropicClient('sk-xxxxx', $cache);

Characteristics:

✅ Simple to set up
✅ No additional dependencies
✅ Works across requests
⚠️ Limited to single server
⚠️ Manual cleanup required

Custom Cache Directory

<?php
$cache = new FileCache('/var/cache/llm');
$client = new AnthropicClient('sk-xxxxx', $cache);

DynamoDB Cache

For distributed systems, use the DynamoDB cache extension:

composer require soukicz/llm-cache-dynamodb

<?php
use Soukicz\Llm\Cache\DynamoDB\DynamoDBCache;
use Aws\DynamoDb\DynamoDbClient;

$dynamodb = new DynamoDbClient([
    'region' => 'us-east-1',
    'version' => 'latest',
]);

$cache = new DynamoDBCache($dynamodb, 'llm-cache-table');
$client = new AnthropicClient('sk-xxxxx', $cache);

Characteristics:

✅ Distributed across servers
✅ Automatic TTL expiration
✅ Scalable
❌ Requires AWS setup
❌ Additional costs

Custom Cache Implementation

Implement the CacheInterface for custom caching:

<?php
use Soukicz\Llm\Cache\CacheInterface;

class RedisCache implements CacheInterface {
    public function __construct(
        private Redis $redis,
        private int $ttl = 3600
    ) {}

    public function get(string $key): ?string {
        $value = $this->redis->get($key);
        return $value !== false ? $value : null;
    }

    public function set(string $key, string $value): void {
        $this->redis->setex($key, $this->ttl, $value);
    }

    public function has(string $key): bool {
        return $this->redis->exists($key) > 0;
    }

    public function delete(string $key): void {
        $this->redis->del($key);
    }
}

<?php
$cache = new RedisCache($redisClient);
$client = new AnthropicClient('sk-xxxxx', $cache);

Cache Keys

Cache keys are generated from:

API endpoint
Model name and version
Request parameters (temperature, maxTokens, etc.)
Conversation messages
Tool definitions

Important: Always use exact model versions to prevent stale cached responses.

Best Practices

Use Exact Model Versions

❌ Bad - Generic naming

<?php
// Vague version could cache responses from old models
$model = new AnthropicClaude45Sonnet('latest');

✅ Good - Explicit version

<?php
// Specific version ensures cache correctness
$model = new AnthropicClaude45Sonnet(AnthropicClaude45Sonnet::VERSION_20250929);

Development vs Production

Development:

<?php
// Aggressive caching to save costs during development
$cache = new FileCache('/tmp/llm-cache');
$client = new AnthropicClient('sk-xxxxx', $cache);

Production:

<?php
// Distributed cache for multi-server setup
$cache = new DynamoDBCache($dynamodb, 'prod-llm-cache');
$client = new AnthropicClient('sk-xxxxx', $cache);

Cache Warming

Pre-cache common requests:

<?php
// Warm cache with common queries
$commonQueries = [
    'What is PHP?',
    'How do I install composer?',
    'What are PHP traits?',
];

foreach ($commonQueries as $query) {
    $response = $agentClient->run(
        client: $client,
        request: new LLMRequest(
            model: $model,
            conversation: new LLMConversation([
                LLMMessage::createFromUserString($query)
            ])
        )
    );
}

Disabling Cache

To bypass cache for specific requests, create a client without cache:

<?php
// No cache
$client = new AnthropicClient('sk-xxxxx', null);

Cache Behavior

What Gets Cached

✅ Successful responses ✅ Complete conversations ✅ Tool call results ✅ Multimodal requests

What Doesn't Get Cached

❌ Failed requests (errors) ❌ Incomplete responses ❌ Async requests in progress

Monitoring Cache Performance

Track cache hit rates:

<?php
class CacheMonitor implements CacheInterface {
    private int $hits = 0;
    private int $misses = 0;

    public function __construct(
        private CacheInterface $cache
    ) {}

    public function get(string $key): ?string {
        $value = $this->cache->get($key);
        if ($value !== null) {
            $this->hits++;
        } else {
            $this->misses++;
        }
        return $value;
    }

    public function set(string $key, string $value): void {
        $this->cache->set($key, $value);
    }

    public function getHitRate(): float {
        $total = $this->hits + $this->misses;
        return $total > 0 ? $this->hits / $total : 0;
    }

    // Implement other interface methods...
}

<?php
$cache = new CacheMonitor(new FileCache('/tmp/cache'));
$client = new AnthropicClient('sk-xxxxx', $cache);

// After some requests...
echo "Cache hit rate: " . ($cache->getHitRate() * 100) . "%\n";

Cache Expiration

Manual Cleanup

<?php
// Clear specific cache entry
$cache->delete($cacheKey);

// Clear all cache (FileCache example)
array_map('unlink', glob('/tmp/llm-cache/*'));

Automatic Expiration

Implement TTL in custom cache:

<?php
class TTLFileCache implements CacheInterface {
    private int $ttl;

    public function __construct(string $directory, int $ttlSeconds = 3600) {
        $this->directory = $directory;
        $this->ttl = $ttlSeconds;
    }

    public function get(string $key): ?string {
        $file = $this->getFilePath($key);

        if (!file_exists($file)) {
            return null;
        }

        // Check if expired
        if (time() - filemtime($file) > $this->ttl) {
            unlink($file);
            return null;
        }

        return file_get_contents($file);
    }

    // Implement other methods...
}

Cost Savings

Example cost calculation:

<?php
$request = new LLMRequest(/*...*/);

// First request - hits API ($0.015)
$response1 = $agentClient->run($client, $request);
echo "Cost: $" . $response1->getTokenUsage()->getTotalCost() . "\n";

// Cached request - no cost ($0.00)
$response2 = $agentClient->run($client, $request);
echo "Cost: $" . $response2->getTokenUsage()->getTotalCost() . "\n";

// 100% savings on repeated requests!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching

Overview

File Cache

Custom Cache Directory

DynamoDB Cache

Custom Cache Implementation

Cache Keys

Best Practices

Use Exact Model Versions

Development vs Production

Cache Warming

Disabling Cache

Cache Behavior

What Gets Cached

What Doesn't Get Cached

Monitoring Cache Performance

Cache Expiration

Manual Cleanup

Automatic Expiration

Cost Savings

See Also

FilesExpand file tree

caching.md

Latest commit

History

caching.md

File metadata and controls

Caching

Overview

File Cache

Custom Cache Directory

DynamoDB Cache

Custom Cache Implementation

Cache Keys

Best Practices

Use Exact Model Versions

Development vs Production

Cache Warming

Disabling Cache

Cache Behavior

What Gets Cached

What Doesn't Get Cached

Monitoring Cache Performance

Cache Expiration

Manual Cleanup

Automatic Expiration

Cost Savings

See Also