[ENG-2759] DSR integration with new cache manager#7708
[ENG-2759] DSR integration with new cache manager#7708
Conversation
Deserialize keys to strings when in `get_keys_by_index` - preferring deliberate conversion over `decode_responses` Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This uses the `clear` method on the DSR cache store, which underneath the hood uses `scan` (if needed, will try to use the set-based key index if that particular ID is being tracked). Adds some tests and removes the dependency on get_all_cache_keys_for_privacy_request
…he new functionality
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
This uses the `clear` method on the DSR cache store, which underneath the hood uses `scan` (if needed, will try to use the set-based key index if that particular ID is being tracked). Adds some tests and removes the dependency on get_all_cache_keys_for_privacy_request
…e deploying this in the middle of DSRs being processed
…written Redis mocks with autospec
Bug 1 (get_all_keys): Index no longer masks legacy keys. Uses a
__migrated:{dsr_id} flag (24h TTL) to skip SCAN after confirming no
legacy stragglers remain. Filters internal keys from SCAN results.
Bug 2 (get_with_legacy): Re-checks new key after double-miss to handle
the race where another reader migrates between the two GETs.
Bug 3 (clear): Second SCAN pass catches keys written by concurrent
migrations. Invalidates migration flag so future reads re-scan.
Removes ad-hoc scan_iter workarounds from get_cached_identity_data and
has_cached_identity_data that were papering over Bug 1.
Replaces hand-written MockRedis/MockPipeline/InMemoryRedis with
create_mock_redis() factory using create_autospec(redis.Redis) — method
signatures are now validated against the real Redis client, preventing
the missing-method class of bugs (exists/setex were missing before).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emony, improve tests - Extract _get_cached_by_type/_has_cached_by_type helpers in DSRCacheStore to eliminate triplicated scan-filter-parse pattern across custom fields, identity, and DRP methods - Replace no-op context manager get_dsr_cache_store() with plain function, removing unnecessary indentation from 11 production and 14 test callsites - Extract shared test fixtures (mock_redis, manager, dsr_store, pr_id, dsr_id) to tests/common/cache/conftest.py, removing duplication across 6 test files - Remove dead code in clear_cache_identities test helper (legacy scan after migration already deleted all legacy keys) - Replace 8 individual legacy compatibility tests with 2 comprehensive lifecycle tests that simulate full in-flight DSR processing across a deployment boundary - Move all local imports to module top level Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Integration Test: In-Flight DSR Migration Setup: 20 access requests created on fidesplus pointed at fides main (legacy cache format: Results:
The PR's "Steps to Confirm" scenario is validated. |
Greptile SummaryThis PR migrates DSR (privacy request) caching to a new structured Key observations:
Confidence Score: 3/5
Important Files Changed
|
… code, add test marker - Fix _get_cached_by_type legacy field extraction to split on infix instead of "-", preserving hyphens in field names (pre-existing bug, caught by Greptile) - Remove dead ConsentRequest.get_cached_identity_data (zero callers in fides/fidesplus) - Add missing pytest.mark.unit to DRP integration tests - Add clarifying comment on falsy-value guard in _get_cached_by_type Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Made some changes and tested locally. Because I made some updates @erosselli has agreed to do a second pass for me.
DSR Cache Migration Test
- Created 20 DSRs on main (legacy format)
- Approved all 20 → wrote cache data as id-{id}-identity-email, id-{id}-async-execution
- Redis: 1,429 legacy keys, 0 new-format keys
- Switched to johnewart/redis-dsr-cache-integration
- All 20 DSRs visible, legacy Redis keys intact
- Created 20 more DSRs on the new branch
- Approved all 20 → identity written in new format (dsr:{id}:identity:email), async execution still legacy (worker hadn't restarted)
- Redis: 20 new-format keys coexisting with legacy keys
- Restarted worker, created 5 fresh DSRs
- All keys fully new format: dsr:{id}:identity:email, dsr:{id}:async_execution, __idx:dsr:{id}
- Zero legacy keys written for fresh DSRs
- Old DSRs' legacy keys still readable via get_with_legacy() fallback
- Final state
- 61 new-format keys coexisting with 1,728 legacy keys
- No errors from mixed key state
- Lazy migration working: legacy keys migrated to new format on read
|
/code-review |
There was a problem hiding this comment.
Review: DSR structured cache migration
This is a solid, well-scoped migration from flat Redis keys to an indexed DSRCacheStore with lazy backward-compatible migration. The approach of writing new keys on first read, combined with a SCAN fallback for legacy data, correctly handles in-flight requests during deployment. The test coverage is thorough, especially the production compatibility lifecycle tests.
No blocking issues found. All inline comments are suggestions or edge-case notes.
Summary of findings
Suggestions (worth fixing):
if value:in_get_cached_by_type(line 150): silently drops falsy-but-valid JSON-encoded values like"0"or"false".if value is not None:is the safer guard now that values are always JSON-encoded.mock_redis._data.clear()in two tests: bypasses index consistency —_setsretains stale index entries, so thehas_cached_*assertions that follow may pass for the wrong reason.dsr_store.clear(pr_id)is the correct reset.get_cached_custom_privacy_request_fieldsmissingjson.JSONDecodeErrorfallback: inconsistent with the identity method's backward-compat handling; low risk but worth aligning.
Nice to have:
- Hardcoded
86400TTL for the migration flag — extract as a named constant. legacy_keysvariable name inget_all_keysis misleading (it includes new-format keys too) —scanned_keyswould be more accurate.decode_responses=Trueassumption instartswithcomparisons inget_all_keys— worth a brief doc note given theAnytype onRedisCacheManager.__init__.very_short_redis_cache_expirationsleeping exactly 1s with a 1s TTL is a CI flakiness risk — 2s TTL would give a safety margin._get_cached_by_typeusessplit(":")[-1]to extract field names, which would fail for multi-colon parts. Not a current bug (only single-colon types use this helper), but worth a comment.
cache_task_tracking_key() and all get_cached_task_id() callers now
route through DSRCacheStore, which writes new-format keys
(dsr:{id}:async_execution) and reads from both new and legacy formats
via get_with_legacy(). This was the last DSR cache path still using
the old id-{id}-async-execution key format directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename legacy_keys/legacy_set → scanned_keys/scanned_set in get_all_keys (variable included both new and legacy keys, name was misleading) - Add JSONDecodeError fallback in get_cached_custom_privacy_request_fields for consistency with get_cached_identity_data - Replace mock_redis._data.clear() with dsr_store.clear() in two tests to properly reset both data and index - Bump very_short_redis_cache_expiration TTL from 1s to 2s to reduce CI flakiness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_cache_tracking_key_has_ttl: check new-format key (dsr:{id}:async_execution)
with fallback to legacy key
- test_customer_data_removed: bump sleep from 1s to 3s to exceed 2s TTL fixture
- test_get_cached_task_id_cache_exception: mock get_dsr_cache_store instead of
get_cache since get_cached_task_id now uses the store
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/code-review |
There was a problem hiding this comment.
Code Review: DSR integration with new cache manager
Good overall approach — the migration to DSRCacheStore is well-structured, the backward-compatibility strategy (lazy migrate-on-read + SCAN fallback) is sound, and the test coverage is thorough with a proper mock_redis autospec replacing the old hand-rolled in-memory implementations. The production compatibility suite is a nice addition.
Critical (must fix)
Missing TTL on legacy key migration (dsr_store.py:103)
When get_with_legacy reads a value from a legacy key and migrates it to the new format, self.set(dsr_id, part, val) is called without expire_seconds. The migrated key is written with no TTL and will persist indefinitely. For requests that fail mid-flight or are abandoned after deployment, this leaks Redis memory and violates data-retention expectations. The fix is to read self._redis.ttl(legacy_key) before deleting the legacy key and propagate it to the new key (only when > 0).
Suggestions
Encoding inconsistency (request_task.py:251, request_service.py:338)
Both RequestTask.get_cached_task_id and the free function get_cached_task_id in request_service.py hardcode "utf-8", while PrivacyRequest.get_cached_task_id (line 698) and get_cached_encryption_key (line 690) use CONFIG.security.encoding. All three should use the same encoding. In practice utf-8 and the default encoding are almost certainly the same, but the inconsistency is a maintenance hazard.
scan_iter may yield bytes (dsr_store.py, get_all_keys)
self._redis is a FidesopsRedis that delegates to the underlying Redis client. If decode_responses=False (the default), scan_iter yields bytes, and k.startswith("__migrated:") raises TypeError. RedisCacheManager.get_keys_by_index already handles this with an explicit .decode("utf-8"). The scanned_keys list comprehension in get_all_keys and clear should apply the same normalisation.
Nice to have
split(":")[-1] is fragile for field names containing colons (dsr_store.py:144)
In _get_cached_by_type, extracting the field name from a new-format key via key.split(":")[-1] silently discards all but the last colon-separated segment. This is fine as long as field names never contain colons (which is likely true for identity attributes and custom field names), but worth a short inline comment to make the assumption explicit.
__migrated / __idx filter dependency (dsr_store.py:412)
The correctness of not (scanned_set - indexed) (used to set the migration flag) depends on __migrated: and __idx: keys being excluded from scanned_keys before that computation. This is currently true, but the two steps are separated in the code. A comment linking them would guard against future refactors breaking the invariant.
Overall the architecture is clean and the test split into focused integration files is much easier to navigate than a single large test module. The main concern to resolve before merge is the missing TTL on migration.
| store = get_dsr_cache_store() | ||
| store.clear(self.id) |
There was a problem hiding this comment.
I know this DSR Cache Store was added in another PR, but all of its methods have a dsr_id parameter. To me this reads like that should actually be a class argument, so that each DSR store instance only relates to a single DSR . E.g
store = get_dsr_cache_store(self.id)
store.clear()There was a problem hiding this comment.
Good call — every caller is already in this PR's diff, so no reason to defer. Refactored DSRCacheStore to take dsr_id as a constructor arg. get_dsr_cache_store(dsr_id) now scopes each instance to a single DSR.
src/fides/common/cache/dsr_store.py
Outdated
There was a problem hiding this comment.
I think set should have expire_seconds be required (or at the very least set a non-zero default for it) so we never accidentally save keys without an expiration date (memory leak)
There was a problem hiding this comment.
Agreed — expire_seconds is now required on set() and write() (and all convenience write methods). For the migration path in get_with_legacy, the remaining TTL is read from the legacy key via redis.ttl() and propagated to the new key. If the legacy key has no TTL (already a leak), it falls back to a configurable default_ttl_seconds passed at construction via get_dsr_cache_store().
- Scope DSRCacheStore to a single DSR: dsr_id is now a constructor arg, removing it from all method signatures (per @erosselli) - Make expire_seconds required on set()/write() and all convenience write methods to prevent accidental no-TTL writes (per @erosselli) - Fix migrated key TTL: get_with_legacy now reads the legacy key's remaining TTL and propagates it to the new key, with a configurable default_ttl_seconds fallback (fixes critical from code review) - Unify encoding: request_task.py and request_service.py now use CONFIG.security.encoding instead of hardcoded "utf-8" - Update all callers and tests for the new API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix _TTL misplaced inside json.dumps() in custom fields and identity integration tests - Update encryption_utils test assertions: get_encryption() no longer takes request_id (it's a constructor arg now) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ticket ENG-2759
Description Of Changes
Adopts new Redis cache manager with secondary indexer for DSR usage:
Code Changes
Steps to Confirm
Pre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works