Skip to content

[ENG-2759] DSR integration with new cache manager#7708

Open
johnewart wants to merge 60 commits intomainfrom
johnewart/redis-dsr-cache-integration
Open

[ENG-2759] DSR integration with new cache manager#7708
johnewart wants to merge 60 commits intomainfrom
johnewart/redis-dsr-cache-integration

Conversation

@johnewart
Copy link
Collaborator

@johnewart johnewart commented Mar 19, 2026

Ticket ENG-2759

Description Of Changes

Adopts new Redis cache manager with secondary indexer for DSR usage:

  • Use new clearing logic for DSR caches
  • Custom fields
  • DRP Request body
  • Identity caching

Code Changes

  • Changes to the DSR process to use the new Redis cache and index for deletions; should lazily migrate / deal with existing in-flight DSRs using the key mapping logic to centrally generate old keys for reading (if they exist) and then new keys for writing to avoid loss of work.
  • Adds a lot of tests, they should be relatively thorough

Steps to Confirm

  1. This is one where it might be good if we could simulate some volume of in-flight DSR processing and then upgrading in the middle of it.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

johnewart and others added 30 commits February 23, 2026 14:59
Deserialize keys to strings when in `get_keys_by_index` - preferring deliberate conversion over `decode_responses`

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This uses the `clear` method on the DSR cache store, which underneath
the hood uses `scan` (if needed, will try to use the set-based key
index if that particular ID is being tracked).

Adds some tests and removes the dependency on
get_all_cache_keys_for_privacy_request
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This uses the `clear` method on the DSR cache store, which underneath
the hood uses `scan` (if needed, will try to use the set-based key
index if that particular ID is being tracked).

Adds some tests and removes the dependency on
get_all_cache_keys_for_privacy_request
johnewart and others added 6 commits March 19, 2026 17:31
…e deploying this in the middle of DSRs being processed
…written Redis mocks with autospec

Bug 1 (get_all_keys): Index no longer masks legacy keys. Uses a
__migrated:{dsr_id} flag (24h TTL) to skip SCAN after confirming no
legacy stragglers remain. Filters internal keys from SCAN results.

Bug 2 (get_with_legacy): Re-checks new key after double-miss to handle
the race where another reader migrates between the two GETs.

Bug 3 (clear): Second SCAN pass catches keys written by concurrent
migrations. Invalidates migration flag so future reads re-scan.

Removes ad-hoc scan_iter workarounds from get_cached_identity_data and
has_cached_identity_data that were papering over Bug 1.

Replaces hand-written MockRedis/MockPipeline/InMemoryRedis with
create_mock_redis() factory using create_autospec(redis.Redis) — method
signatures are now validated against the real Redis client, preventing
the missing-method class of bugs (exists/setex were missing before).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emony, improve tests

- Extract _get_cached_by_type/_has_cached_by_type helpers in DSRCacheStore to
  eliminate triplicated scan-filter-parse pattern across custom fields, identity,
  and DRP methods
- Replace no-op context manager get_dsr_cache_store() with plain function,
  removing unnecessary indentation from 11 production and 14 test callsites
- Extract shared test fixtures (mock_redis, manager, dsr_store, pr_id, dsr_id)
  to tests/common/cache/conftest.py, removing duplication across 6 test files
- Remove dead code in clear_cache_identities test helper (legacy scan after
  migration already deleted all legacy keys)
- Replace 8 individual legacy compatibility tests with 2 comprehensive lifecycle
  tests that simulate full in-flight DSR processing across a deployment boundary
- Move all local imports to module top level

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara
Copy link
Contributor

JadeCara commented Mar 20, 2026

Integration Test: In-Flight DSR Migration

Setup: 20 access requests created on fidesplus pointed at fides main (legacy cache format: id-{id}-identity-{attr}). Fidesplus then restarted on the feature branch.

Results:

  • All 20 requests successfully moved from pending → approved → in_processing using the new DSRCacheStore code
  • Legacy cache keys migrated to new format for all 5 sampled requests: 0 legacy keys remaining, 2 new-format keys each, all indexed
  • 12 still processing, 8 paused at requires_input (manual tasks on connectors — expected), 2 errors (unrelated to cache)
  • Zero cache-related failures. The lazy migration worked as designed: new code read legacy keys, migrated them to dsr:{id}:identity:{attr} format, and continued processing.

The PR's "Steps to Confirm" scenario is validated.

@JadeCara JadeCara marked this pull request as ready for review March 20, 2026 22:31
@JadeCara JadeCara requested a review from a team as a code owner March 20, 2026 22:31
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR migrates DSR (privacy request) caching to a new structured DSRCacheStore backed by a RedisCacheManager with secondary index support, replacing direct KEYS/SCAN key-space lookups with an O(index-size) set-based approach. The migration is designed for zero downtime: new writes use the dsr:{id}:{part} key format, while reads lazily fall back to legacy id-{id}-… keys and optionally migrate them on first access. The change is well-structured with a centralized KeyMapper, thorough unit and integration test coverage, and a clean factory-function pattern (get_dsr_cache_store()).

Key observations:

  • Legacy field name extraction bug in _get_cached_by_typekey.split("-")[-1] only returns the last hyphen-separated segment, which silently drops data for field names containing hyphens during legacy key enumeration.
  • Index TTL gap__idx:dsr:{id} sets have no expiration by default; if clear() is never called after data keys expire, stale index entries accumulate.
  • repr() serialization for complex DRP body values is not reversible; callers reading back the cached DRP body receive raw Python repr strings for any nested objects.
  • Short variable names (k) appear in dsr_store.py and manager.py.
  • This PR touches 24 files, exceeding the project's 15-file PR limit.

Confidence Score: 3/5

  • The PR is safe to merge for most DSRs but has a latent data-loss path for hyphenated legacy field names and an index TTL gap that warrants fixing before widespread production use.
  • The overall architecture is sound and the test coverage is thorough for the happy path. However, the split("-")[-1] field extraction in _get_cached_by_type is fragile and could silently drop custom field data during migration if any legacy field key contains a hyphen. The index TTL gap is a long-term operational concern. These two issues together reduce confidence.
  • src/fides/common/cache/dsr_store.py — the _get_cached_by_type method (legacy field extraction) and the clear/get_all_keys index TTL handling need attention.

Important Files Changed

Filename Overview
src/fides/common/cache/dsr_store.py Core DSR cache store with new key format, index-backed listing, and lazy legacy key migration. Has a logic bug in _get_cached_by_type where split("-")[-1] fails to extract field names containing hyphens from legacy keys. Index sets also have no TTL, risking stale entries after data keys expire.
src/fides/common/cache/manager.py Solid Redis cache manager with pipeline-based set/delete and index operations. Minor style issue with single-character loop variable k.
src/fides/api/models/privacy_request/privacy_request.py Updated to use DSRCacheStore for identity, custom fields, DRP body, and encryption caching. cache_drp_request_body uses non-reversible repr() for complex object serialization — may be pre-existing behavior but lacks a comment explaining the intent.
src/fides/api/util/cache.py Adds get_redis_cache_manager() and get_dsr_cache_store() factory functions; legacy key helper functions retained for backward compatibility. Changes are clean and well-scoped.
src/fides/common/cache/key_mapping.py Centralized legacy-to-new key mapping class. Well-structured with static methods for every field type. Clean and readable.
src/fides/api/tasks/encryption_utils.py Migrated to use DSRCacheStore.get_encryption() for encryption key lookup. Clean migration with proper bytes/str handling.
src/fides/api/models/privacy_request/consent.py Added TODO comment identifying get_cached_identity_data as dead code. The dead code itself remains; the comment accurately documents the issue.
tests/common/cache/test_dsr_store.py Thorough unit tests covering set/get, index registration, legacy migration, and convenience methods using an in-memory Redis mock.
tests/common/cache/test_dsr_store_migration.py Tests for legacy key readability, migration-on-read, new writes, and cross-DSR isolation. Good coverage, though no tests exercise hyphenated field names in legacy keys.
tests/common/cache/mock_redis.py Well-built autospec'd Redis mock with in-memory backing for sets, strings, TTLs, and pipelines. Covers all methods used by the DSR store.

Comments Outside Diff (1)

  1. src/fides/common/cache/manager.py, line 78-82 (link)

    P2 Single-character variable name k

    The loop variable k is a single-character name. Use a more descriptive name such as cache_key.

    Rule Used: Use full names for variables, not 1 to 2 character... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Last reviewed commit: "Review cleanup: dedu..."

… code, add test marker

- Fix _get_cached_by_type legacy field extraction to split on infix instead of "-",
  preserving hyphens in field names (pre-existing bug, caught by Greptile)
- Remove dead ConsentRequest.get_cached_identity_data (zero callers in fides/fidesplus)
- Add missing pytest.mark.unit to DRP integration tests
- Add clarifying comment on falsy-value guard in _get_cached_by_type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara JadeCara requested a review from erosselli March 23, 2026 20:12
Copy link
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes and tested locally. Because I made some updates @erosselli has agreed to do a second pass for me.

DSR Cache Migration Test

  1. Created 20 DSRs on main (legacy format)
  • Approved all 20 → wrote cache data as id-{id}-identity-email, id-{id}-async-execution
  • Redis: 1,429 legacy keys, 0 new-format keys
  1. Switched to johnewart/redis-dsr-cache-integration
  • All 20 DSRs visible, legacy Redis keys intact
  1. Created 20 more DSRs on the new branch
  • Approved all 20 → identity written in new format (dsr:{id}:identity:email), async execution still legacy (worker hadn't restarted)
  • Redis: 20 new-format keys coexisting with legacy keys
  1. Restarted worker, created 5 fresh DSRs
  • All keys fully new format: dsr:{id}:identity:email, dsr:{id}:async_execution, __idx:dsr:{id}
  • Zero legacy keys written for fresh DSRs
  • Old DSRs' legacy keys still readable via get_with_legacy() fallback
  1. Final state
  • 61 new-format keys coexisting with 1,728 legacy keys
  • No errors from mixed key state
  • Lazy migration working: legacy keys migrated to new format on read

@JadeCara
Copy link
Contributor

/code-review

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: DSR structured cache migration

This is a solid, well-scoped migration from flat Redis keys to an indexed DSRCacheStore with lazy backward-compatible migration. The approach of writing new keys on first read, combined with a SCAN fallback for legacy data, correctly handles in-flight requests during deployment. The test coverage is thorough, especially the production compatibility lifecycle tests.

No blocking issues found. All inline comments are suggestions or edge-case notes.

Summary of findings

Suggestions (worth fixing):

  • if value: in _get_cached_by_type (line 150): silently drops falsy-but-valid JSON-encoded values like "0" or "false". if value is not None: is the safer guard now that values are always JSON-encoded.
  • mock_redis._data.clear() in two tests: bypasses index consistency — _sets retains stale index entries, so the has_cached_* assertions that follow may pass for the wrong reason. dsr_store.clear(pr_id) is the correct reset.
  • get_cached_custom_privacy_request_fields missing json.JSONDecodeError fallback: inconsistent with the identity method's backward-compat handling; low risk but worth aligning.

Nice to have:

  • Hardcoded 86400 TTL for the migration flag — extract as a named constant.
  • legacy_keys variable name in get_all_keys is misleading (it includes new-format keys too) — scanned_keys would be more accurate.
  • decode_responses=True assumption in startswith comparisons in get_all_keys — worth a brief doc note given the Any type on RedisCacheManager.__init__.
  • very_short_redis_cache_expiration sleeping exactly 1s with a 1s TTL is a CI flakiness risk — 2s TTL would give a safety margin.
  • _get_cached_by_type uses split(":")[-1] to extract field names, which would fail for multi-colon parts. Not a current bug (only single-colon types use this helper), but worth a comment.

Jade Wibbels and others added 3 commits March 23, 2026 17:36
cache_task_tracking_key() and all get_cached_task_id() callers now
route through DSRCacheStore, which writes new-format keys
(dsr:{id}:async_execution) and reads from both new and legacy formats
via get_with_legacy(). This was the last DSR cache path still using
the old id-{id}-async-execution key format directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename legacy_keys/legacy_set → scanned_keys/scanned_set in get_all_keys
  (variable included both new and legacy keys, name was misleading)
- Add JSONDecodeError fallback in get_cached_custom_privacy_request_fields
  for consistency with get_cached_identity_data
- Replace mock_redis._data.clear() with dsr_store.clear() in two tests
  to properly reset both data and index
- Bump very_short_redis_cache_expiration TTL from 1s to 2s to reduce
  CI flakiness

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_cache_tracking_key_has_ttl: check new-format key (dsr:{id}:async_execution)
  with fallback to legacy key
- test_customer_data_removed: bump sleep from 1s to 3s to exceed 2s TTL fixture
- test_get_cached_task_id_cache_exception: mock get_dsr_cache_store instead of
  get_cache since get_cached_task_id now uses the store

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara
Copy link
Contributor

/code-review

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: DSR integration with new cache manager

Good overall approach — the migration to DSRCacheStore is well-structured, the backward-compatibility strategy (lazy migrate-on-read + SCAN fallback) is sound, and the test coverage is thorough with a proper mock_redis autospec replacing the old hand-rolled in-memory implementations. The production compatibility suite is a nice addition.

Critical (must fix)

Missing TTL on legacy key migration (dsr_store.py:103)
When get_with_legacy reads a value from a legacy key and migrates it to the new format, self.set(dsr_id, part, val) is called without expire_seconds. The migrated key is written with no TTL and will persist indefinitely. For requests that fail mid-flight or are abandoned after deployment, this leaks Redis memory and violates data-retention expectations. The fix is to read self._redis.ttl(legacy_key) before deleting the legacy key and propagate it to the new key (only when > 0).

Suggestions

Encoding inconsistency (request_task.py:251, request_service.py:338)
Both RequestTask.get_cached_task_id and the free function get_cached_task_id in request_service.py hardcode "utf-8", while PrivacyRequest.get_cached_task_id (line 698) and get_cached_encryption_key (line 690) use CONFIG.security.encoding. All three should use the same encoding. In practice utf-8 and the default encoding are almost certainly the same, but the inconsistency is a maintenance hazard.

scan_iter may yield bytes (dsr_store.py, get_all_keys)
self._redis is a FidesopsRedis that delegates to the underlying Redis client. If decode_responses=False (the default), scan_iter yields bytes, and k.startswith("__migrated:") raises TypeError. RedisCacheManager.get_keys_by_index already handles this with an explicit .decode("utf-8"). The scanned_keys list comprehension in get_all_keys and clear should apply the same normalisation.

Nice to have

split(":")[-1] is fragile for field names containing colons (dsr_store.py:144)
In _get_cached_by_type, extracting the field name from a new-format key via key.split(":")[-1] silently discards all but the last colon-separated segment. This is fine as long as field names never contain colons (which is likely true for identity attributes and custom field names), but worth a short inline comment to make the assumption explicit.

__migrated / __idx filter dependency (dsr_store.py:412)
The correctness of not (scanned_set - indexed) (used to set the migration flag) depends on __migrated: and __idx: keys being excluded from scanned_keys before that computation. This is currently true, but the two steps are separated in the code. A comment linking them would guard against future refactors breaking the invariant.


Overall the architecture is clean and the test split into focused integration files is much easier to navigate than a single large test module. The main concern to resolve before merge is the missing TTL on migration.

Comment on lines +468 to +469
store = get_dsr_cache_store()
store.clear(self.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this DSR Cache Store was added in another PR, but all of its methods have a dsr_id parameter. To me this reads like that should actually be a class argument, so that each DSR store instance only relates to a single DSR . E.g

store = get_dsr_cache_store(self.id) 
store.clear()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — every caller is already in this PR's diff, so no reason to defer. Refactored DSRCacheStore to take dsr_id as a constructor arg. get_dsr_cache_store(dsr_id) now scopes each instance to a single DSR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set should have expire_seconds be required (or at the very least set a non-zero default for it) so we never accidentally save keys without an expiration date (memory leak)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — expire_seconds is now required on set() and write() (and all convenience write methods). For the migration path in get_with_legacy, the remaining TTL is read from the legacy key via redis.ttl() and propagated to the new key. If the legacy key has no TTL (already a leak), it falls back to a configurable default_ttl_seconds passed at construction via get_dsr_cache_store().

Jade Wibbels and others added 2 commits March 24, 2026 16:38
- Scope DSRCacheStore to a single DSR: dsr_id is now a constructor arg,
  removing it from all method signatures (per @erosselli)
- Make expire_seconds required on set()/write() and all convenience write
  methods to prevent accidental no-TTL writes (per @erosselli)
- Fix migrated key TTL: get_with_legacy now reads the legacy key's
  remaining TTL and propagates it to the new key, with a configurable
  default_ttl_seconds fallback (fixes critical from code review)
- Unify encoding: request_task.py and request_service.py now use
  CONFIG.security.encoding instead of hardcoded "utf-8"
- Update all callers and tests for the new API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jade Wibbels and others added 2 commits March 25, 2026 08:46
- Fix _TTL misplaced inside json.dumps() in custom fields and identity
  integration tests
- Update encryption_utils test assertions: get_encryption() no longer
  takes request_id (it's a constructor arg now)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants