Skip to content

Explainer: Memory Cache Metrics API with Eviction Tracking #32

@rjmurillo

Description

@rjmurillo

Note

Polyfill Status: The .NET runtime is adding native cache metrics in .NET 11 via dotnet/runtime#124140 (API-approved, milestone 11.0.0). This library serves as a polyfill for .NET 8/9/10 applications until .NET 11 reaches GA in November 2026. The API surface intentionally mirrors the approved runtime design (meter name, instrument names, tag schema) to ensure a smooth migration path.

Items below marked strikethrough were not approved by the .NET API review board (bartonjs review). Items marked with ✅ are implemented in the current codebase.

Introduction/Overview

The Memory Cache Metrics API with Eviction Tracking addresses critical performance issues in containerized environments where memory pressure can cause cache thrashing, leading to degraded application performance. This feature modernizes cache usage patterns by providing visibility into cache behavior and enabling proactive memory pressure handling.

The primary goal is to prevent performance degradation by tracking cache evictions, providing comprehensive metrics, and supporting both global default caches and component-specific caches in a backward-compatible manner.

Goals

  1. Provide eviction visibility: Track and report cache eviction counts with ~100ns acceptable overhead per operation
  2. Enable proactive monitoring: Support comprehensive cache metrics (hits, misses, eviction reasons, memory usage)
  3. Support modern deployment patterns: Design for container-friendly memory management with dynamic sizing capabilities
  4. Maintain backward compatibility: Ensure existing applications using MemoryCacheStatistics continue to work without modification
  5. Enable external integration: Support export to monitoring systems (Prometheus, Application Insights, OpenTelemetry)
  6. Provide flexible registration: Support both automatic DI-based registration and explicit component-specific cache registration

Non-Goals (Out of Scope)

  • Automatic cache size adjustment based on memory pressure (future enhancement)
  • Custom eviction policy implementation
  • Cache data persistence or recovery mechanisms
  • Real-time alerting or notification systems
  • Performance optimization beyond the ~100ns overhead target
  • Migration tools for existing custom monitoring solutions
  • Distributed, nested, or hierarchical cache; use HybridCache
  • Recommendations for cache size limits based on container memory constraints

User Stories

  1. As a service developer, I want to track eviction counts in my application's default memory cache so that I can identify when memory pressure is causing performance issues.
  2. As a library author, I want to register my component's cache with a metrics system so that service owners can monitor my library's cache behavior alongside their application caches.
  3. As a DevOps engineer, I want to export cache metrics to Prometheus so that I can create dashboards and alerts for cache performance in containerized environments.
  4. As a performance engineer, I want to distinguish between different eviction reasons (memory pressure vs expiration) so that I can identify true performance problems versus normal cache operation.
  5. As an application architect, I want to configure cache metrics collection with different sampling rates so that I can balance monitoring granularity with performance overhead.Not approved. Observable instruments have zero hot-path overhead, making sampling unnecessary.

Functional Requirements

  1. The system must extend MemoryCacheStatistics to include TotalEvictedEntries property without breaking existing applications. Implemented via custom CacheStatistics class; the BCL MemoryCacheStatistics.TotalEvictions property is approved for .NET 11 in dotnet/runtime#124140.
  2. The system must provide a MemoryCacheMetrics service that can register and track multiple named caches through dependency injection.Not approved. The .NET API review rejected the IMemoryCacheMetrics registry pattern as "non-intuitive" in favor of native metrics on MemoryCache via IMeterFactory.
  3. The system must support eviction tracking with configurable overhead allowing sampling rates from real-time to periodic (5-30 seconds). Implemented via observable instruments (polled, zero hot-path overhead).
  4. The system must distinguish between eviction reasons including memory pressure, expiration, and manual removal. Implemented: EvictionReason.Removed and EvictionReason.Replaced are excluded from eviction counts.
  5. The system must prevent duplicate cache registration by maintaining weak references to registered cache instances.Not approved. Part of the rejected IMemoryCacheMetrics registry pattern.
  6. The system must handle naming conflicts by either throwing exceptions for duplicates or using automatic resolution strategies.Not approved. Part of the rejected IMemoryCacheMetrics registry pattern.
  7. The system must integrate with OpenTelemetry/IMeterFactory to enable export to external monitoring systems. Implemented with 4 observable instruments: cache.requests, cache.evictions, cache.entries, cache.estimated_size.
  8. The system must provide extension methods for easy service registration in ASP.NET Core applications. Implemented: AddNamedMeteredMemoryCache and DecorateMemoryCacheWithMetrics.
  9. The system must implement circuit breaker functionality to reduce metrics collection if overhead exceeds configurable thresholds.Not approved. Observable instruments are polled (not pushed), so hot-path overhead is already near-zero; circuit breakers are unnecessary with this architecture.
  10. The system must support opt-in statistics tracking. Automatic discovery of DI-registered caches is not approved (part of the rejected registry pattern). Opt-in tracking implemented via MeteredMemoryCacheOptions and DI extension methods.
  11. The system must provide comprehensive metrics including hit/miss ratios, cache size, item lifecycle data, and operation latency. Implemented: hits, misses, evictions, entry count, estimated size, and calculated hit ratio via CacheStatistics.
  12. The system must use weak references to prevent memory leaks when clients forget to unregister caches.Not approved. Part of the rejected IMemoryCacheMetrics registry pattern.

Design Considerations

  • API Surface: Custom CacheStatistics class with TotalEvictions (polyfills the approved BCL MemoryCacheStatistics.TotalEvictions property)
  • Registration Pattern: Use explicit registration for component-specific caches with potential future automatic discoveryRejected by .NET API review
  • Configuration Tiers: Provide no-config defaults, simple predefined profiles, and advanced fine-grained control
  • Memory Safety: Implement weak reference patterns to prevent memory leaks from unregistered cachesPart of rejected registry pattern
  • Performance: Design for minimal overhead with observable instruments (zero hot-path allocation)

Technical Considerations

  • Integration with existing DI container: Leverage IMeterFactory for OpenTelemetry compatibility
  • Thread safety: Ensure metrics collection is thread-safe for high-concurrency scenarios (implemented via Interlocked atomics)
  • Weak reference management: Implement proper cleanup of disposed cache referencesPart of rejected registry pattern
  • Sampling strategies: Support configurable sampling rates to balance accuracy with performanceNot approved; unnecessary with observable instruments
  • Export mechanisms: Design pluggable exporters for different monitoring backends (implemented via standard OpenTelemetry pipeline)

Success Metrics

  1. Performance overhead: Maintain <100ns per cache operation when metrics are enabled — Validated via BenchmarkDotNet suites (CacheBenchmarks, MetricsOverheadBenchmarks, ContentionBenchmarks)
  2. Adoption rate: Achieve integration in existing applications without requiring code changes (for basic scenarios)
  3. Diagnostic value: Enable identification of cache thrashing patterns that were previously invisible
  4. Container efficiency: Reduce memory-related performance issues in containerized deployments by 20% — Not yet measured
  5. Monitoring integration: Support export to at least 3 major monitoring platforms (Prometheus, Application Insights, Data Dog) — Supported via standard OpenTelemetry exporter pipeline

Open Questions

  1. Automatic cache sizing documentation: (future enhancement) Provide recommendations for cache configuration based on container memory constraints
  2. Metric retention: How long should in-memory metrics be retained before aggregation/export, and should this be configurable?Resolved: metrics use observable instruments polled by the OTel SDK; retention is controlled by the configured exporter, not by this library.
  3. Performance testing scope: What specific performance benchmarks should be established to validate the <100ns overhead target across different cache usage patterns?Resolved: three benchmark suites implemented — CacheBenchmarks (operation overhead), MetricsOverheadBenchmarks (instrumentation cost), ContentionBenchmarks (concurrent contention).
  4. Migration documentation: What level of detail is needed in migration guides for applications migrating from this polyfill to native .NET 11 MemoryCache metrics?

Parent Tasks for Memory Cache Metrics API with Eviction Tracking

1. ✅ Extend MemoryCacheStatistics with Eviction Tracking

  • Enhance Microsoft's MemoryCacheStatistics to include TotalEvictedEntries property while maintaining backward compatibility. Implemented via custom CacheStatistics class in MeteredMemoryCache.

Sub-tasks:

  • Create CacheStatistics class with TotalEvictions property (polyfills BCL MemoryCacheStatistics.TotalEvictions)
  • Implement backward-compatible statistics via MeteredMemoryCache.GetCurrentStatistics()
  • Add thread-safe eviction counting mechanism (via Interlocked atomics)
  • Create mapping between PostEvictionReason and eviction categories (excludes Removed/Replaced)
  • Add validation and error handling for statistics collection
  • Write comprehensive unit tests for statistics extensions
  • Add integration tests with existing cache implementations
  • Create performance benchmarks to validate <100ns overhead target
  • Update API documentation and usage examples

2. Create MemoryCacheMetrics Service Infrastructure

  • Develop a centralized service for registering and tracking multiple named caches with weak reference management and conflict resolutionNot approved by .NET API review. The registry pattern (IMemoryCacheMetrics) was rejected as "non-intuitive." The approved approach uses native IMeterFactory on MemoryCache directly.

Sub-tasks:

  • Design IMemoryCacheMetrics interface with registration and tracking methods
  • Implement MemoryCacheMetrics service with weak reference management
  • Create cache registration system with naming conflict resolution
  • Implement automatic cleanup of disposed cache references
  • Add thread-safe concurrent access patterns for multi-cache scenarios
  • Create cache discovery mechanism for DI-registered caches
  • Implement metrics aggregation across multiple named caches
  • Add cache lifecycle management (registration, tracking, cleanup)
  • Write comprehensive unit tests for service functionality
  • Create integration tests for multi-cache scenarios
  • Add service registration extensions for dependency injection
  • Document service usage patterns and best practices

3. Implement Circuit Breaker and Sampling Mechanisms

  • ~~Add configurable overhead protection with circuit breaker functionality and sampling rates to maintain the ~100ns performance target~~ — Not approved. Observable instruments are polled (not pushed), so hot-path overhead is already near-zero. Circuit breakers and sampling are unnecessary with this architecture.

Sub-tasks:

  • Design circuit breaker threshold configuration
  • Implement sampling rate controls
  • Add overhead monitoring and adaptive throttling
  • Write tests for circuit breaker behavior
  • Document sampling configuration options

4. Add Advanced Cache Management Features

  • Implement component-specific cache registration, automatic discovery of DI-registered caches, and enhanced export mechanismsNot approved. Component-specific registration and automatic discovery are part of the rejected IMemoryCacheMetrics registry pattern.

Sub-tasks:

  • Implement component-specific cache registration
  • Add automatic discovery of DI-registered caches
  • Create enhanced export mechanisms for cache metrics
  • Write integration tests for advanced scenarios
  • Document advanced cache management patterns

5. Integrate OpenTelemetry and External Monitoring

  • Enhance existing OpenTelemetry integration with support for multiple monitoring backends (Prometheus, Application Insights) and comprehensive metrics export

Sub-tasks:

  • Enhance existing OpenTelemetry integration with new metrics
  • Create Prometheus metrics exporter with proper label handling
  • Implement Application Insights integration with custom metrics
  • Add support for Grafana dashboard configuration
  • Create pluggable exporter architecture for extensibility (via standard OTel pipeline)
  • Create configuration system for multiple export destinations
  • Write integration tests for all supported monitoring backends
  • Create example configurations for popular monitoring setups
  • Document monitoring setup and troubleshooting guides

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions