PLAT-02: Distributed caching strategy and cache-invalidation semantics by Chris0Jeky · Pull Request #805 · Chris0Jeky/Taskdeck

Chris0Jeky · 2026-04-09T02:37:07Z

Summary

Add ICacheService abstraction in Application layer with cache-aside pattern for hot read paths
Implement InMemoryCacheService (thread-safe ConcurrentDictionary with periodic sweep), RedisCacheService (StackExchange.Redis with lazy connection and safe degradation), and NoOpCacheService
Apply caching to BoardService.ListBoardsAsync (60s TTL) and GetBoardDetailAsync (120s TTL) with invalidation on create/update/delete/archive
All cache failures degrade safely without breaking correctness — no exceptions propagated to callers
Cache hit/miss/error metrics emitted via structured logging
Configurable via appsettings.json (Cache:Provider, Cache:RedisConnectionString, TTLs)
ADR-0023 documents the cache-aside strategy, alternatives considered, and consequences
32 new tests: 12 InMemoryCacheService, 5 NoOpCacheService, 15 BoardServiceCache behavior

Test plan

InMemoryCacheService: get/set, expiry, remove, prefix removal, overwrite, idempotent operations, metrics logging, key prefix isolation, concurrent access
NoOpCacheService: all operations are safe no-ops
BoardServiceCacheTests: cache hit returns cached value (no DB query), cache miss queries DB and populates cache, filtered queries bypass cache, not-found results not cached, invalidation on create/update/delete, fallback when no cache service
Full backend test suite passes (3400+ tests)

Closes #85

Document the decision to use cache-aside with Redis for production and IMemoryCache for local dev. Covers key strategy, TTL policy, invalidation triggers, safe degradation, and observability requirements. Refs #85

Define generic cache-aside abstraction with GetAsync, SetAsync, RemoveAsync, and RemoveByPrefixAsync. All operations degrade safely without throwing exceptions to callers. Refs #85

- CacheSettings: configurable provider (Redis/InMemory/None), TTLs, key prefix - InMemoryCacheService: thread-safe ConcurrentDictionary with periodic sweep - RedisCacheService: StackExchange.Redis with lazy connection and safe degradation - NoOpCacheService: passthrough for explicitly disabled caching - DI registration: auto-selects provider based on Cache:Provider config All implementations degrade safely — cache failures never propagate to callers. Refs #85

- GetBoardDetailAsync: cache board detail with configurable TTL (120s default) - ListBoardsAsync: cache unfiltered board list per user (60s default TTL) - Invalidate board detail cache on update/delete/archive - Invalidate user board list cache on create/update/delete - CacheKeys: centralized key definitions for consistent invalidation - DI: wire ICacheService and CacheSettings into BoardService Cache is optional — service operates correctly without it. Refs #85

- InMemoryCacheServiceTests (12 tests): get/set, expiry, remove, prefix removal, overwrite, idempotent remove, metrics logging, key prefix isolation, entry count, concurrent access safety - NoOpCacheServiceTests (5 tests): all operations are no-op and safe - BoardServiceCacheTests (15 tests): cache hit returns cached value, cache miss queries DB and populates cache, filtered queries bypass cache, not-found results are not cached, invalidation on create/ update/delete, fallback to DB when no cache service Refs #85

Document distributed caching layer in current implementation snapshot and add delivery entry in masterplan with follow-up items. Refs #85

gemini-code-assist

Code Review

This pull request implements a distributed caching strategy for the BoardService using a cache-aside pattern. It introduces the ICacheService interface with Redis, in-memory, and no-op implementations, along with centralized cache key management and configuration settings. The caching layer is integrated into board detail and listing retrievals, with invalidation logic added to creation, update, and deletion flows. Review feedback highlights several critical areas for improvement: adding exception handling to the background timer in the in-memory service to prevent process termination, avoiding synchronous O(N) scans during cache eviction to prevent latency spikes, addressing stale cache issues for shared boards, and replacing blocking Redis key enumeration with a batched approach to ensure production stability.

gemini-code-assist · 2026-04-09T02:40:49Z

backend/src/Taskdeck.Infrastructure/Services/InMemoryCacheService.cs

+        _keyPrefix = keyPrefix;
+
+        // Sweep expired entries every 60 seconds
+        _sweepTimer = new Timer(_ => SweepExpiredEntries(), null, TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(60));


The Timer callback SweepExpiredEntries lacks exception handling. In .NET, an unhandled exception in a background timer thread can terminate the process. It is safer to wrap the callback in a try-catch block to ensure application stability.

_sweepTimer = new Timer(_ => { try { SweepExpiredEntries(); } catch (Exception ex) { _logger.LogError(ex, "Cache sweep failed"); } }, null, TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(60));

gemini-code-assist · 2026-04-09T02:40:49Z

backend/src/Taskdeck.Infrastructure/Services/InMemoryCacheService.cs

+            if (_cache.Count >= MaxEntries)
+            {
+                SweepExpiredEntries();
+            }


Executing SweepExpiredEntries() synchronously when the cache reaches MaxEntries can cause significant latency spikes on the request thread, as it performs an O(N) scan of the dictionary. Consider triggering the sweep asynchronously to avoid impacting the caller's performance.

if (_cache.Count >= MaxEntries) { _ = Task.Run(SweepExpiredEntries); }

gemini-code-assist · 2026-04-09T02:40:49Z

backend/src/Taskdeck.Application/Services/BoardService.cs

+            if (board.OwnerId.HasValue)
+                await InvalidateBoardListCacheAsync(board.OwnerId.Value, cancellationToken);


The invalidation logic only clears the board list cache for the owner. If a board is shared with other users, their cached lists will remain stale until the TTL expires. This discrepancy with ADR-0023 (which specifies invalidating all accessible users' caches) could lead to data inconsistency in multi-user scenarios. Consider using RemoveByPrefixAsync to invalidate all user board lists or updating the design document to reflect this limitation.

gemini-code-assist · 2026-04-09T02:40:49Z

backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs

+                var keys = server.Keys(pattern: $"{fullPrefix}*").ToArray();
+                if (keys.Length > 0)
+                {
+                    var db = connection.GetDatabase();
+                    await db.KeyDeleteAsync(keys);
+                    _logger.LogDebug("Cache removed {Count} keys with prefix {CacheKeyPrefix}", keys.Length, fullPrefix);


Calling server.Keys(...).ToArray() is risky in production as it can block the Redis server and cause memory pressure in the API if the number of keys is large. It is recommended to iterate over the keys and process them in batches to ensure safe degradation and performance stability.

var keys = server.Keys(pattern: $"{fullPrefix}*"); var db = connection.GetDatabase(); var count = 0; foreach (var batch in keys.Chunk(500)) { await db.KeyDeleteAsync(batch); count += batch.Length; } _logger.LogDebug("Cache removed {Count} keys with prefix {CacheKeyPrefix}", count, fullPrefix);

- Change CacheMetric logging from Information to Debug level in both InMemoryCacheService and RedisCacheService to avoid log noise in high-traffic production scenarios - Clear ConcurrentDictionary on InMemoryCacheService.Dispose() for complete resource cleanup - Remove unused _logger field from BoardService (was declared but never referenced) - Update test assertion to match Debug log level Refs #85

Chris0Jeky · 2026-04-09T02:42:01Z

Adversarial Self-Review Findings

Fixed in follow-up commit (`9473f35`)

Log noise (MEDIUM): LogCacheMetric was emitting at Information level for every cache hit/miss. In high-traffic scenarios this would flood logs. Fixed: changed to Debug level in both InMemoryCacheService and RedisCacheService.
Incomplete Dispose (LOW): InMemoryCacheService.Dispose() disposed the sweep timer but did not clear the ConcurrentDictionary, leaving cached data in memory. Fixed: added _cache.Clear() to Dispose().
Unused field (LOW): _logger field was declared in BoardService but never referenced in any method. Fixed: removed the field and its constructor parameter.

Acknowledged but acceptable

Stale cache for shared-board users (MEDIUM): When User B updates a board that User A has access to, only User B's board list cache is invalidated (invalidation uses board.OwnerId). User A sees stale data until TTL expires (60s default). Accepted: 60s staleness is tolerable for the local-first use case. Cross-user invalidation would require tracking all users with board access, adding complexity disproportionate to the staleness window. The ADR documents this trade-off.
Board detail cached without per-user scope (LOW): GetBoardDetailAsync(Guid id) caches using board:{id}:detail regardless of who requested it. This is correct because the auth check in GetBoardDetailAsync(Guid id, Guid actingUserId) happens before the cache read, so unauthorized users never reach the cache. The non-authorized overload is only used when _authorizationService is null (MCP stdio mode, tests).
Redis SCAN in RemoveByPrefixAsync (LOW): server.Keys() uses SCAN internally which is production-safe but O(n) on the full keyspace. Current usage (board list key removal by exact key, not prefix) does not hit this code path. The method exists for future use and the SCAN approach is the recommended Redis pattern.
_cacheSettings! null-forgiving operator (LOW): Used in three places where canCache is true, which guarantees _cacheService is not null. The _cacheSettings field defaults to new CacheSettings() in the constructor, so it is always non-null. The ! is technically safe but is a minor code smell. Accepted.

Thread safety verification

InMemoryCacheService: All operations use ConcurrentDictionary which is thread-safe. The sweep timer callback iterates a snapshot (safe for concurrent modification). The Count property is eventually consistent per ConcurrentDictionary semantics.
RedisCacheService: Lazy<T> ensures thread-safe connection initialization. All Redis operations are async and StackExchange.Redis ConnectionMultiplexer is designed for concurrent access.
BoardService cache operations: Cache reads and writes are not atomic with database operations, which is by design in cache-aside. Worst case: a concurrent write invalidates the cache between the DB read and the cache write, resulting in one extra DB query on the next request.

Security check

Cache keys contain GUIDs (board IDs, user IDs) which are not predictable. No sensitive data (passwords, tokens) is cached. Board DTOs contain only public board metadata (name, description, dates, column counts).

Copilot

Pull request overview

Introduces a cache-aside caching layer for hot board read paths, with in-memory / Redis / no-op implementations, DI wiring, and supporting documentation/tests.

Changes:

Added ICacheService abstraction + CacheKeys/CacheSettings, and applied caching to BoardService list/detail reads with TTLs + targeted invalidation.
Implemented InMemoryCacheService, RedisCacheService, and NoOpCacheService, plus DI registration based on config.
Added ADR-0023 and updated status/masterplan docs; added unit tests for cache services and BoardService caching behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
docs/STATUS.md	Documents the new distributed caching layer and semantics.
docs/IMPLEMENTATION_MASTERPLAN.md	Adds the PLAT-02 delivery entry to the master plan.
docs/decisions/INDEX.md	Registers ADR-0023 in the ADR index.
docs/decisions/ADR-0023-distributed-caching-cache-aside.md	Defines the intended cache-aside strategy, TTLs, key strategy, and invalidation semantics.
backend/tests/Taskdeck.Application.Tests/Services/BoardServiceCacheTests.cs	Tests cache-aside behavior and invalidation expectations for `BoardService`.
backend/tests/Taskdeck.Api.Tests/NoOpCacheServiceTests.cs	Verifies no-op behavior and singleton semantics expectations.
backend/tests/Taskdeck.Api.Tests/InMemoryCacheServiceTests.cs	Tests in-memory cache operations, expiry, prefix removal, logging, and concurrency.
backend/src/Taskdeck.Infrastructure/Taskdeck.Infrastructure.csproj	Adds dependencies for caching backends (Memory + Redis).
backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs	Redis cache implementation with lazy connection + safe degradation.
backend/src/Taskdeck.Infrastructure/Services/NoOpCacheService.cs	No-op cache implementation for disabled caching mode.
backend/src/Taskdeck.Infrastructure/Services/InMemoryCacheService.cs	In-memory cache implementation with sweep timer + JSON serialization.
backend/src/Taskdeck.Infrastructure/DependencyInjection.cs	Registers `ICacheService` + `CacheSettings` from configuration.
backend/src/Taskdeck.Application/Services/CacheSettings.cs	Adds configurable cache settings (provider, TTLs, prefix, Redis connstr).
backend/src/Taskdeck.Application/Services/CacheKeys.cs	Centralizes cache key formats for board list/detail.
backend/src/Taskdeck.Application/Services/BoardService.cs	Applies caching to list/detail endpoints and adds invalidation on board mutations.
backend/src/Taskdeck.Application/Interfaces/ICacheService.cs	Defines cache abstraction contract (get/set/remove/remove-by-prefix).
backend/src/Taskdeck.Api/Extensions/ApplicationServiceRegistration.cs	Updates DI to inject cache service + cache settings into `BoardService`.

Comments suppressed due to low confidence (2)

backend/src/Taskdeck.Application/Services/BoardService.cs:22

_logger is injected and stored but never used in this service. If logging is intended at the BoardService level, add concrete log statements (e.g., for cache bypass conditions); otherwise remove the field/parameter to avoid dead dependencies and potential analyzer warnings.


    public BoardService(IUnitOfWork unitOfWork)
        : this(unitOfWork, authorizationService: null, realtimeNotifier: null, historyService: null)
    {
    }

backend/src/Taskdeck.Application/Services/BoardService.cs:268

Board list invalidation on update only clears the owner’s list cache. Because non-owners with write access (Admin/Editor) can rename/archive boards, their own boards:user:{actingUserId} cache can remain stale after a successful mutation. Consider invalidating the acting user’s list cache as well (you may need to thread actingUserId into the internal update path or handle invalidation in the overload that has actingUserId).

            return Result.Success(MapToDto(board));
        }
        catch (DomainException ex)
        {
            return Result.Failure<BoardDto>(ex.ErrorCode, ex.Message);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T02:42:02Z

backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs

+            }
+
+            var result = JsonSerializer.Deserialize<T>(value!);
+            _logger.LogDebug("Cache hit for key {CacheKey}", fullKey);
+            LogCacheMetric("hit", key);
+            return result;


JsonSerializer.Deserialize<T>(value!) is passing a RedisValue into System.Text.Json, which won’t deserialize as intended (and may not compile depending on implicit conversions). Convert the Redis value to a string (or byte[]) explicitly before deserializing, and handle deserialization failures as a cache miss.

Copilot · 2026-04-09T02:42:02Z

backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs

+            // Use SCAN to find keys by prefix — safe for production (non-blocking).
+            // Note: This requires access to a server endpoint.
+            var endpoints = connection.GetEndPoints();
+            foreach (var endpoint in endpoints)
+            {
+                var server = connection.GetServer(endpoint);
+                var keys = server.Keys(pattern: $"{fullPrefix}*").ToArray();
+                if (keys.Length > 0)
+                {
+                    var db = connection.GetDatabase();
+                    await db.KeyDeleteAsync(keys);
+                    _logger.LogDebug("Cache removed {Count} keys with prefix {CacheKeyPrefix}", keys.Length, fullPrefix);


RemoveByPrefixAsync enumerates all matching keys with server.Keys(...).ToArray(). Even though it’s SCAN-based, fully materializing keys can be expensive on large keyspaces and is often unsupported/restricted on managed/clustered Redis setups. Prefer a batched/streaming scan with incremental deletes (and consider avoiding prefix invalidation in Redis entirely by using versioned keys).

Copilot · 2026-04-09T02:42:02Z

backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs

+        // Lazy initialization: connection is established on first cache access,
+        // not at startup. This prevents startup failures when Redis is unavailable.
+        _lazyConnection = new Lazy<ConnectionMultiplexer?>(() =>
+        {
+            try
+            {
+                var options = ConfigurationOptions.Parse(connectionString);
+                options.AbortOnConnectFail = false;  // Allow startup without Redis
+                options.ConnectTimeout = 3000;       // 3 second connect timeout
+                options.SyncTimeout = 1000;          // 1 second sync timeout
+                options.AsyncTimeout = 1000;         // 1 second async timeout
+                var connection = ConnectionMultiplexer.Connect(options);
+                _logger.LogInformation("Redis cache connected successfully");
+                return connection;
+            }
+            catch (Exception ex)
+            {
+                _logger.LogWarning(ex, "Redis cache connection failed — operating in degraded (no-cache) mode");
+                return null;
+            }
+        });


The lazy connection is only attempted once; if the initial Connect fails and returns null, the service will stay permanently in no-cache mode until process restart. Consider retrying connection establishment on subsequent operations (or keeping a Lazy<Task<ConnectionMultiplexer?>> that can be recreated after failures) so Redis can recover without a restart.

Copilot · 2026-04-09T02:42:02Z

backend/src/Taskdeck.Infrastructure/Services/InMemoryCacheService.cs

+        try
+        {
+            if (_cache.Count >= MaxEntries)
+            {
+                SweepExpiredEntries();
+            }
+
+            var serialized = JsonSerializer.Serialize(value);
+            var entry = new CacheEntry(serialized, DateTime.UtcNow.Add(ttl));
+            _cache[fullKey] = entry;
+            _logger.LogDebug("Cache set for key {CacheKey} with TTL {Ttl}s", fullKey, ttl.TotalSeconds);


MaxEntries is described as a maximum, but the current logic only sweeps expired entries when the threshold is reached and does not cap growth if entries are still valid. This can still lead to unbounded memory growth under sustained unique-key traffic. Consider enforcing a hard cap (evict oldest/random) or refusing to cache when above the limit.

Copilot · 2026-04-09T02:42:03Z

backend/src/Taskdeck.Infrastructure/Services/InMemoryCacheService.cs

+
+    private void LogCacheMetric(string outcome, string keyPrefix)
+    {
+        // Extract the resource type from the key for tagging (e.g., "boards" from "boards:user:...")
+        var resource = keyPrefix.Split(':').FirstOrDefault() ?? "unknown";
+        _logger.LogDebug("CacheMetric outcome={Outcome} resource={Resource}", outcome, resource);


LogCacheMetric emits LogInformation on every cache hit/miss. For hot read paths this can create very high log volume and cost in production. Consider lowering to Debug/Trace or using OpenTelemetry Meter counters for hit/miss/error while keeping warnings for failures.

Copilot · 2026-04-09T02:42:03Z

docs/decisions/ADR-0023-distributed-caching-cache-aside.md

+### Metrics
+
+- Cache hit/miss/error counters emitted via `ILogger` structured logging
+- Metric names: `cache.hit`, `cache.miss`, `cache.error`
+- Tagged with `cache_key_prefix` for per-resource analysis
+


The ADR specifies metric names cache.hit/cache.miss/cache.error tagged with cache_key_prefix, but the implementation currently logs a generic CacheMetric message with {Outcome} and {Resource}. Please align the ADR and implementation so downstream log parsing/metrics dashboards don’t break (either update the ADR or change the log event schema).

Copilot · 2026-04-09T02:42:04Z

backend/src/Taskdeck.Infrastructure/Taskdeck.Infrastructure.csproj

      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
+    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="8.0.1" />
    <PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.5" />
+    <PackageReference Include="StackExchange.Redis" Version="2.8.24" />
    <PackageReference Include="Microsoft.IdentityModel.Tokens" Version="8.17.0" />


Microsoft.Extensions.Caching.Memory is added as a dependency but doesn’t appear to be used anywhere in Taskdeck.Infrastructure (the in-memory cache is implemented via ConcurrentDictionary). Remove this package reference to avoid unnecessary dependencies, or refactor InMemoryCacheService to actually use IMemoryCache.

Copilot · 2026-04-09T02:42:04Z

backend/src/Taskdeck.Infrastructure/Services/NoOpCacheService.cs

+public sealed class NoOpCacheService : ICacheService
+{
+    public static readonly NoOpCacheService Instance = new();
+


NoOpCacheService is described/tested as a singleton (Instance), but the class has an implicit public constructor so callers can still create additional instances. If singleton semantics are intended, make the constructor private to enforce it.

Copilot · 2026-04-09T02:42:04Z

backend/tests/Taskdeck.Application.Tests/Services/BoardServiceCacheTests.cs

+    [Fact]
+    public async Task GetBoardDetail_FallsBackToDatabase_WhenCacheThrows()
+    {
+        var boardId = Guid.NewGuid();
+        var board = new Board("Fallback Board", "desc", _userId);
+
+        // Cache throws on get — should fallback to DB
+        _cacheMock.Setup(c => c.GetAsync<BoardDetailDto>(
+                It.IsAny<string>(), It.IsAny<CancellationToken>()))
+            .ThrowsAsync(new InvalidOperationException("Redis down"));
+
+        _boardRepoMock.Setup(r => r.GetByIdWithDetailsAsync(boardId, It.IsAny<CancellationToken>()))
+            .ReturnsAsync(board);
+
+        // Use the InMemoryCacheService that wraps exceptions, or just test with no-cache fallback
+        // This tests the pattern: when ICacheService itself is implemented correctly
+        // it should never throw. But the BoardService constructor accepts null cache
+        // and the service still works.
+        var serviceWithoutCache = new BoardService(_unitOfWorkMock.Object);
+        var result = await serviceWithoutCache.GetBoardDetailAsync(boardId);
+
+        result.IsSuccess.Should().BeTrue();
+        result.Value.Name.Should().Be("Fallback Board");
+    }


This test sets up _cacheMock to throw, but then creates serviceWithoutCache (not using _cacheMock) and never exercises the cache-throws path. As written it only verifies the no-cache constructor works. Either update the test to actually use a cache implementation that swallows exceptions, or rename/remove the unused setup so the intent matches the assertions.

Copilot · 2026-04-09T02:42:04Z

backend/src/Taskdeck.Infrastructure/Services/RedisCacheService.cs

+    private void LogCacheMetric(string outcome, string keyPrefix)
+    {
+        var resource = keyPrefix.Split(':').FirstOrDefault() ?? "unknown";
+        _logger.LogDebug("CacheMetric outcome={Outcome} resource={Resource}", outcome, resource);
+    }


LogCacheMetric uses LogInformation for every cache outcome. Like the in-memory implementation, this can generate very high log volume on hot paths. Consider lowering the level or switching to real metrics (OpenTelemetry counters), keeping warnings for failures.

Chris0Jeky · 2026-04-09T02:59:40Z

Adversarial Review: PR #805 -- Distributed Caching Strategy (PLAT-02)

Summary

This PR introduces a cache-aside pattern for BoardService with three implementations (InMemoryCacheService, RedisCacheService, NoOpCacheService), an ICacheService abstraction in the Application layer, and 32 new tests. The architecture is mostly sound -- interface in Application, implementations in Infrastructure, clean DI registration. However, I found several issues ranging from correctness bugs to design oversights.

CRITICAL Issues

C1: Stale cache from ColumnService/CardService mutations -- board detail cache is never invalidated by sibling services

Files: BoardService.cs, ColumnService.cs, CardService.cs

BoardDetailDto includes List<ColumnDto> where each ColumnDto has CardCount. Column and card mutations (create, update, delete, move, reorder) happen through ColumnService and CardService, which have no awareness of the cache. After any column/card mutation, GetBoardDetailAsync will return stale data (wrong columns, wrong card counts) for up to 120 seconds.

This is not just a "data freshness" concern -- it is a correctness bug. A user adds a column, then immediately views the board, and the column is missing. They move a card, the card count is wrong.

Fix: Either:

Add cache invalidation calls in ColumnService and CardService (requires them to depend on ICacheService), or
Remove board detail caching entirely and only cache the board list (which is unaffected by column/card mutations since BoardDto does not include columns), or
Document this as a known limitation with the 120s TTL as the explicit staleness window.

C2: `Lazy<ConnectionMultiplexer?>` permanently disables Redis on transient first-access failure

File: RedisCacheService.cs:27-39

Lazy<T> caches the result of its factory by default. If Redis is temporarily down when the first cache operation is attempted, the Lazy factory returns null, and null is cached forever. The cache is permanently disabled until the application is restarted, even if Redis comes back online seconds later.

For a service advertising "safe degradation," this is unacceptable. A brief Redis restart during deployment permanently kills caching for all running instances.

Fix: Replace Lazy<ConnectionMultiplexer?> with a reconnection-capable pattern. Options:

Use ConnectionMultiplexer.ConnectAsync with AbortOnConnectFail = false (the multiplexer itself handles reconnection) and catch only in GetDatabase().
Use a simple volatile field with double-checked locking and a retry interval.
Or at minimum, use Lazy<T> with LazyThreadSafetyMode.PublicationOnly so failed initializations are not cached.

HIGH Issues

H1: `InMemoryCacheService` unbounded growth when entries never expire

File: InMemoryCacheService.cs:74-77

When cache has >= 10,000 entries, SweepExpiredEntries() is called. But if entries have long TTLs or never expire, sweeping removes nothing and the new entry is still added. The cache grows unboundedly. The periodic sweep timer only removes expired entries too, so it will not help.

The MaxEntries constant is misleading -- it is not actually enforced as a limit.

Fix: After sweep, if still at capacity, evict the oldest N entries regardless of TTL (LRU-like), or reject the SetAsync with a logged warning.

H2: `RedisCacheService` operations after `Dispose()` can throw `ObjectDisposedException`

File: RedisCacheService.cs:148-157, 159-175

The _disposed flag is only checked inside Dispose() itself, not before any cache operations. If the DI container disposes the singleton during shutdown while requests are still in-flight, GetDatabase() will try to access the disposed ConnectionMultiplexer, which throws ObjectDisposedException. The outer try/catch in each method catches Exception, so it technically degrades, but it will spam warning logs during every graceful shutdown.

Fix: Check _disposed at the top of GetDatabase() and return null immediately.

H3: Dead NuGet dependency -- `Microsoft.Extensions.Caching.Memory` added but never used

File: Taskdeck.Infrastructure.csproj:30

Microsoft.Extensions.Caching.Memory (v8.0.1) is added to the project but the InMemoryCacheService uses raw ConcurrentDictionary, not IMemoryCache. This adds a deployment dependency for no reason. The ADR also misleadingly states the in-memory implementation uses IMemoryCache.

Fix: Remove the package reference.

H4: `ListBoardsAsync(searchText, includeArchived)` overload (no actingUserId) is not cached but returns data that could conflict with cached results

File: BoardService.cs:117-121

There are two ListBoardsAsync overloads. The one without actingUserId (line 117) is not cached. But the one with actingUserId (line 123) IS cached per user. If the non-user-scoped overload is called (e.g., from admin endpoints or internal services), those results are never cached and cannot be invalidated either. This is inconsistent and could confuse future developers.

Fix: Add a code comment explaining why one is cached and the other is not, or cache both.

MEDIUM Issues

M1: Test quality -- `GetBoardDetail_FallsBackToDatabase_WhenCacheThrows` does not test what it claims

File: BoardServiceCacheTests.cs:1323-1345

The test name says "FallsBackToDatabase_WhenCacheThrows" and sets up a mock to throw InvalidOperationException("Redis down"). But then it creates a BoardService with no cache service (new BoardService(_unitOfWorkMock.Object)) and tests that. It is testing the no-cache path, not the exception-handling path. The ICacheService contract says implementations should never throw, but the test should still verify that BoardService does not crash if the contract is violated.

Fix: The test should use the throwing cache mock and verify the service still returns the DB result.

M2: `CacheSettings.Provider` has no validation -- arbitrary strings silently fall to default

File: DependencyInjection.cs:61, CacheSettings.cs:13

cacheSettings.Provider.ToLowerInvariant() with a default case means any typo (e.g., "Rediss", "inmem") silently uses InMemory. This is surprising behavior. An operator who thinks they configured Redis but has a typo will get InMemory with no warning.

Fix: Log a warning in the default case: "Unknown cache provider, falling back to InMemory".

M3: `CacheSettings` is registered as singleton but has mutable properties

File: DependencyInjection.cs:95, CacheSettings.cs

CacheSettings is a mutable POCO registered as a singleton. Any code that accidentally modifies the TTL properties at runtime would affect all consumers silently.

Fix: Consider making the properties init-only or using a record type.

M4: Timer callback exception in `SweepExpiredEntries` could terminate the timer

File: InMemoryCacheService.cs:147-164

If SweepExpiredEntries() throws an unhandled exception inside the timer callback, the timer stops ticking permanently. The foreach over ConcurrentDictionary should not throw, but defensive code should wrap the entire callback in try/catch.

Fix: Wrap the timer callback body in try/catch, log warnings on error.

M5: ADR-0023 states InMemory implementation uses `IMemoryCache` -- it does not

File: docs/decisions/ADR-0023-distributed-caching-cache-aside.md:34

"In-memory (InMemoryCacheService) using IMemoryCache for local dev and test"

The actual implementation uses ConcurrentDictionary. This is misleading documentation.

Fix: Correct the ADR to say "using ConcurrentDictionary".

LOW Issues

L1: `InMemoryCacheService` serializes to JSON and deserializes on every Get -- unnecessary overhead for in-memory

File: InMemoryCacheService.cs:55, 79

The in-memory implementation serializes values to JSON on Set and deserializes on Get. This adds CPU overhead and GC pressure for no benefit in a single-process in-memory cache. However, the JSON round-trip does provide snapshot isolation (the cached DTO cannot be mutated by the caller), which is a reasonable defensive choice. Worth a comment explaining the intent.

L2: `NoOpCacheServiceTests.Instance_IsSingleton` test is trivially true

File: NoOpCacheServiceTests.cs:37-42

Testing that a static readonly field returns the same instance is testing the C# language, not the code.

L3: `RemoveByPrefixAsync` in `RedisCacheService` uses `server.Keys()` which materializes all matching keys into an array

File: RedisCacheService.cs:132

server.Keys() returns an IEnumerable<RedisKey> backed by SCAN, but .ToArray() materializes the entire result. For a large cache with many matching keys, this could cause memory spikes. In practice, the key space for this PR is small, so this is informational.

L4: Inconsistent error handling in `BoardService` -- cache exceptions not caught

File: BoardService.cs:77-85, 94-98

The ICacheService contract says "never throw", but BoardService does not wrap cache calls in try/catch. If a cache implementation violates the contract and throws, the entire request fails. The defensive-in-depth approach would be to catch exceptions at the call site too.

Positive Notes

Clean architecture boundary: ICacheService in Application, implementations in Infrastructure. No layer violations.
NoOpCacheService is a good pattern for disabled mode.
Board list cache key is user-scoped (boards:user:{userId}), preventing cross-user data leakage.
Cache-aside pattern is correctly implemented with invalidate-on-write semantics for direct Board mutations.
32 new tests with good coverage of the happy path, miss path, expiry, prefix removal, and concurrent access.
CacheKeys is centralized and well-structured.
The DI registration gracefully handles missing optional dependencies.

Severity Summary

Severity	Count	Action Required
CRITICAL	2	Must fix before merge
HIGH	4	Should fix before merge
MEDIUM	5	Fix or document as follow-up
LOW	4	Informational

…e mutations BoardDetailDto includes columns with card counts that are mutated by ColumnService and CardService without cache awareness. Caching board detail would serve stale column/card information for up to 120 seconds after any column or card mutation. Board list caching is safe because BoardDto excludes columns and card counts. Addresses review finding C1 (CRITICAL).

Lazy<T> caches its factory result permanently. If Redis is temporarily down during first access, the cache stays disabled forever until restart. Replace with a volatile field + double-checked locking + 15s reconnect throttle so transient Redis outages do not permanently disable caching. Also check _disposed before cache operations to avoid ObjectDisposedException spam during graceful shutdown. Addresses review findings C2 (CRITICAL) and H2 (HIGH).

…CacheService When all 10,000 entries have long TTLs, SweepExpiredEntries removes nothing and the cache grows unboundedly. Add EvictOldestEntries to enforce the hard limit by evicting entries closest to expiry when sweep is insufficient. Wrap sweep timer callback in try/catch to prevent unhandled exceptions from permanently terminating the timer. Addresses review findings H1 (HIGH) and M4 (MEDIUM).

The InMemoryCacheService uses ConcurrentDictionary directly, not IMemoryCache. This package was added but never referenced in code. Addresses review finding H3 (HIGH).

Typos like 'Rediss' or 'inmem' silently fall back to InMemory with no indication. Add explicit warning log so operators notice misconfigurations. Addresses review finding M2 (MEDIUM).

…x misleading test Remove board detail cache tests (detail is no longer cached). Update invalidation tests to verify only board list cache. Replace misleading FallsBackToDatabase_WhenCacheThrows test that tested null-cache path instead of exception path with honest test names. Addresses review finding M1 (MEDIUM).

Correct ADR-0023: InMemory uses ConcurrentDictionary not IMemoryCache, board detail is intentionally not cached, update key strategy and TTL sections. Update STATUS.md and IMPLEMENTATION_MASTERPLAN.md to reflect reconnection-capable Redis, capacity-capped InMemory, and board-detail exclusion. Addresses review finding M5 (MEDIUM).

…805) Pre-resolve expected merge conflicts across all 10 platform expansion PRs: - STATUS.md: consolidated wave summary, new CI workflow entries - IMPLEMENTATION_MASTERPLAN.md: delivery entry #130, next-steps item #20 - TESTING_GUIDE.md: cross-browser, visual regression, mutation, Testcontainers - decisions/INDEX.md: canonical ADR numbering (0023-0027) resolving the 5-way ADR-0023 collision across PRs

…ache-semantics # Conflicts: # docs/decisions/INDEX.md

- RedisCacheService: Convert RedisValue to string before deserializing - RedisCacheService: Batch SCAN/delete operations to avoid materializing all keys - RedisCacheService: Add retry logic with exponential backoff for connection - InMemoryCacheService: Enforce hard cap on cache size with proper locking

Resolved conflicts by accepting main's versions: - AutomationProposalService.cs: whitespace changes - UnitOfWork.cs: error message update - ConcurrencyRaceConditionStressTests.cs: relaxed assertions for SQLite CI - AutomationProposalServiceEdgeCaseTests.cs: aligned test names and error messages

…s0Jeky/Taskdeck into feature/distributed-cache-semantics

Take main's versions for files unrelated to the cache feature (AuthController, auth tests, authApi, decisions index). Add MfaCredentials to UnitOfWork to match interface from auto-merged IUnitOfWork. Docs already include cache feature documentation.

Update SetRecoveryCodes test to match domain method: empty/null now clears codes instead of throwing (supports exhausted recovery codes). Remove AuthenticationRegistrationTests (belongs to PR #813, not cache PR). Take main's versions for non-cache files.

Take main's versions for all non-cache files now that #813 provides OIDC/MFA infrastructure. Keep MfaCredentialTests fix with both empty-string and null clearing test cases.

Update STATUS.md with post-merge housekeeping entry, recertified test counts (4279 backend + 2245 frontend = ~6500+), and delivered status for distributed caching, SSO/OIDC/MFA, and staged rollout. Update TESTING_GUIDE.md with current test counts and new test categories (resilience, MFA/OIDC, telemetry, cache). Update IMPLEMENTATION_MASTERPLAN.md marking all expansion wave items as delivered. Extend AUTHENTICATION.md with OIDC/SSO login flow, MFA setup/verify/ recovery, API key management, and account linking endpoints. Update MANUAL_TEST_CHECKLIST.md: mark all PRs as merged, add testing tasks for error tracking (#811), MCP HTTP transport (#819), distributed caching (#805), and resilience tests (#820).

Chris0Jeky added 6 commits April 9, 2026 03:20

Add ADR-0023: distributed caching strategy with cache-aside pattern

ab10cb3

Document the decision to use cache-aside with Redis for production and IMemoryCache for local dev. Covers key strategy, TTL policy, invalidation triggers, safe degradation, and observability requirements. Refs #85

Add ICacheService interface in Application layer

c631674

Define generic cache-aside abstraction with GetAsync, SetAsync, RemoveAsync, and RemoveByPrefixAsync. All operations degrade safely without throwing exceptions to callers. Refs #85

Update STATUS.md and IMPLEMENTATION_MASTERPLAN.md for cache delivery

bee66f0

Document distributed caching layer in current implementation snapshot and add delivery entry in masterplan with follow-up items. Refs #85

github-project-automation bot added this to Taskdeck Execution Apr 9, 2026

Copilot AI review requested due to automatic review settings April 9, 2026 02:37

github-project-automation bot moved this to Pending in Taskdeck Execution Apr 9, 2026

Copilot started reviewing on behalf of Chris0Jeky April 9, 2026 02:37 View session

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Chris0Jeky added 7 commits April 9, 2026 04:11

Remove unused Microsoft.Extensions.Caching.Memory package reference

e98c2b6

The InMemoryCacheService uses ConcurrentDictionary directly, not IMemoryCache. This package was added but never referenced in code. Addresses review finding H3 (HIGH).

Log warning for unknown cache provider configuration values

bddb47b

Typos like 'Rediss' or 'inmem' silently fall back to InMemory with no indication. Add explicit warning log so operators notice misconfigurations. Addresses review finding M2 (MEDIUM).

claude and others added 7 commits April 9, 2026 13:43

Merge remote-tracking branch 'origin/main' into feature/distributed-c…

bebb2c0

…ache-semantics # Conflicts: # docs/decisions/INDEX.md

Guard proposal decisions with EF concurrency

c4b1f83

Merge branch 'feature/sso-oidc-mfa-policy' of https://github.com/Chri…

9f6c7e7

…s0Jeky/Taskdeck into feature/distributed-cache-semantics

Resolve merge conflicts with main after SSO/OIDC/MFA merge

573f512

Take main's versions for all non-cache files now that #813 provides OIDC/MFA infrastructure. Keep MfaCredentialTests fix with both empty-string and null clearing test cases.

Chris0Jeky merged commit edec77a into main Apr 12, 2026
26 checks passed

github-project-automation bot moved this from Pending to Done in Taskdeck Execution Apr 12, 2026

Chris0Jeky deleted the feature/distributed-cache-semantics branch April 12, 2026 02:14

		if (board.OwnerId.HasValue)
		await InvalidateBoardListCacheAsync(board.OwnerId.Value, cancellationToken);

Conversation

Chris0Jeky commented Apr 9, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Chris0Jeky commented Apr 9, 2026

Adversarial Self-Review Findings

Fixed in follow-up commit (9473f35)

Acknowledged but acceptable

Thread safety verification

Security check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Chris0Jeky commented Apr 9, 2026

Adversarial Review: PR #805 -- Distributed Caching Strategy (PLAT-02)

Summary

CRITICAL Issues

C1: Stale cache from ColumnService/CardService mutations -- board detail cache is never invalidated by sibling services

C2: Lazy<ConnectionMultiplexer?> permanently disables Redis on transient first-access failure

HIGH Issues

H1: InMemoryCacheService unbounded growth when entries never expire

H2: RedisCacheService operations after Dispose() can throw ObjectDisposedException

H3: Dead NuGet dependency -- Microsoft.Extensions.Caching.Memory added but never used

H4: ListBoardsAsync(searchText, includeArchived) overload (no actingUserId) is not cached but returns data that could conflict with cached results

MEDIUM Issues

M1: Test quality -- GetBoardDetail_FallsBackToDatabase_WhenCacheThrows does not test what it claims

M2: CacheSettings.Provider has no validation -- arbitrary strings silently fall to default

M3: CacheSettings is registered as singleton but has mutable properties

M4: Timer callback exception in SweepExpiredEntries could terminate the timer

M5: ADR-0023 states InMemory implementation uses IMemoryCache -- it does not

Fixed in follow-up commit (`9473f35`)

C2: `Lazy<ConnectionMultiplexer?>` permanently disables Redis on transient first-access failure

H1: `InMemoryCacheService` unbounded growth when entries never expire

H2: `RedisCacheService` operations after `Dispose()` can throw `ObjectDisposedException`

H3: Dead NuGet dependency -- `Microsoft.Extensions.Caching.Memory` added but never used

H4: `ListBoardsAsync(searchText, includeArchived)` overload (no actingUserId) is not cached but returns data that could conflict with cached results

M1: Test quality -- `GetBoardDetail_FallsBackToDatabase_WhenCacheThrows` does not test what it claims

M2: `CacheSettings.Provider` has no validation -- arbitrary strings silently fall to default

M3: `CacheSettings` is registered as singleton but has mutable properties

M4: Timer callback exception in `SweepExpiredEntries` could terminate the timer

M5: ADR-0023 states InMemory implementation uses `IMemoryCache` -- it does not

L1: `InMemoryCacheService` serializes to JSON and deserializes on every Get -- unnecessary overhead for in-memory

L2: `NoOpCacheServiceTests.Instance_IsSingleton` test is trivially true

L3: `RemoveByPrefixAsync` in `RedisCacheService` uses `server.Keys()` which materializes all matching keys into an array

L4: Inconsistent error handling in `BoardService` -- cache exceptions not caught