-
Notifications
You must be signed in to change notification settings - Fork 0
Worker performance: MVO update conversion + advanced caching strategies #436
Description
Summary
Follow-on performance work from #338 (Worker database performance optimisation, now closed). All original phases (1–5) plus Phase 6 MVO creates are complete. This issue tracks the remaining items: MVO update conversion to raw SQL and the three Future Considerations (A, B, C) from the plan document.
MVO Update Conversion to Raw SQL
Plan document: `docs/plans/doing/MVO_COPY_BINARY_PERSISTENCE.md`
Context: `CreateMetaverseObjectsAsync` was converted to COPY binary in #338 Phase 6. `UpdateMetaverseObjectsAsync` remains on EF Core with per-entity `Entry().State` management.
Why updates are fundamentally harder than creates
Creates are a single operation (INSERT rows). Updates involve four distinct operations in one flush:
| Operation | Raw SQL approach | Complexity |
|---|---|---|
| UPDATE parent MVO rows (scalar changes) | `UPDATE ... FROM (VALUES ...)` batch | Low |
| INSERT new attribute values | COPY binary (proven pattern) | Low |
| UPDATE existing attribute values | `UPDATE ... FROM (VALUES ...)` — 13 nullable columns | Moderate |
| DELETE removed attribute values | `DELETE ... WHERE "Id" IN (...)` | Low |
The crux is knowing which AVs are new vs modified vs deleted without EF's change tracker. Currently the caller applies `PendingAttributeValueAdditions`/`Removals` to `mvo.AttributeValues` before calling update, then EF uses `IsKeySet` to classify. A raw SQL conversion would need the caller to preserve that classification or the repository to diff against the database — both are invasive.
Additionally, MVOs arrive in mixed EF states (tracked vs detached, `AutoDetectChangesEnabled` on vs off) depending on whether it's a per-page flush or cross-page reference resolution. The current `UpdateDetachedSafe` + per-entity state management was the result of 11 debugging attempts (see `docs/notes/done/CROSS_PAGE_REFERENCE_IDENTITY_CONFLICT.md`).
Recommendation: Wait for profiling data from real-world delta syncs. Creates are the high-volume operation during initial sync; updates are smaller per-page and EF overhead is proportionally less significant.
Future Consideration A: Upgrade CSO Lookup Index to Full Entity Cache
Upgrade the Phase 1 `IMemoryCache` from storing a lookup index (external ID → CSO GUID) to storing full CSO entity graphs (`AsNoTracking`). Eliminates all database queries on cache hit — lookups become pure in-memory O(1).
Trade-offs: ~100x more memory per entry (~10 KB vs ~100 bytes), EF entity lifecycle complexity (attach/detach for modification), stricter cache invalidation requirements.
When to consider: If per-object PK loads on cache hit (~1–2ms each) cause unacceptable import times at scale (50k+ objects per connected system).
Future Consideration B: MVO Join-Attribute Lookup Cache
Extend `IMemoryCache` to cache MVO join-attribute values — the attributes referenced in object matching rules. Sync matching (`FindMetaverseObjectUsingMatchingRuleAsync`) becomes a pure in-memory lookup + PK load, eliminating N+1 matching queries entirely.
Trade-offs: MVO attribute values can change during sync (unlike CSO external IDs), requiring invalidation on updates. Cache warming must resolve which attributes are join-rule attributes from sync rule configuration.
When to consider: If Phase 2's lightweight query optimisation doesn't sufficiently reduce sync matching times at scale (50k+ MVOs).
Future Consideration C: Persistent In-Memory Model for the Worker
Architectural shift: maintain a persistent in-memory cache of everything needed for run profile execution (CSOs, MVOs, attribute values, sync rules, object types). The database becomes a persistence layer rather than the primary data source during execution.
Architectural options to explore:
- Service-lifetime cache with write-through updates
- Per-connected-system lazy-loaded segments
- Event-driven cache invalidation
- Hybrid: lightweight indexes always in memory, full entities on demand
Operator mode switch: Consider exposing as an admin-configurable processing mode:
- Direct mode (default) — query database on demand, lower memory, suitable for modest hosts
- In-memory mode — persistent cache, requires adequate memory, massively increased throughput
When to consider: If surgical optimisations don't bring sync cycle times to acceptable levels, or if we decide to push further. Requires a dedicated design phase.
Architectural notes: Lookup vs Hydration separation (#440 → here)
Added 2026-04-01 during #440 analysis. These notes capture design decisions that affect how #440's incremental work transitions into the caching strategies above.
The seam that matters
#440 introduces a pre-fetch dictionary for CSO external ID lookups during import. The key architectural insight is that lookup and hydration are fundamentally different operations:
| Operation | Data size | Lifetime | Cache-friendly? |
|---|---|---|---|
| Lookup (external ID → CSO ID) | ~200 bytes/entry | Stable across import | Excellent — small, rarely changes |
| Hydration (CSO ID → full entity graph) | ~20-40 KB/entry | Needed briefly for diffing | Poor for persistent cache at scale |
#440 should implement these as separate method boundaries (not interfaces — just clean method signatures):
```
LookupCsoByExternalId(externalId) → Guid? // Today: dictionary. Future: cache.
HydrateCso(csoId) → ConnectedSystemObject // Today: batch DB query. Future: cache + DB fallback.
```
This means Future Consideration A (full entity cache) slots in by swapping the HydrateCso implementation, not rewriting the import loop. Future Consideration C (persistent in-memory model) swaps both.
What #440's dictionary becomes in each future
| Future | Dictionary role | Hydration role |
|---|---|---|
| Current (#440) | In-memory dictionary, loaded once per import | Batch DB query per page of ~500 |
| A: Full entity cache | Cache lookup (dictionary as warm-up/fallback) | Cache-first, DB fallback for misses |
| B: MVO join cache | Unchanged — CSO lookup is orthogonal to MVO matching | Unchanged |
| C: Persistent model | Subsumed by the persistent model's index layer | Subsumed by the persistent model's entity layer |
Cache invalidation constraint to track
The pre-fetch dictionary must be updated when new CSOs are created mid-import (same-batch creates that are then referenced by later objects). This is trivial for an in-memory dictionary (just add the new entry). However, persistent caches (A, C) face a harder problem:
- Cross-process visibility: If multiple workers share a cache (Redis), a CSO created by worker A must be visible to worker B's lookup
- Write-through timing: The CSO must be persisted to DB and cache before another object can match against it
- The existing `seenExternalIds` pattern (SyncImportTaskProcessor.cs line 978) handles same-batch duplicates and provides a model for how mid-import cache updates should work
This is not a blocker for #440 (in-memory dictionary, single worker) but must be addressed in the design phase for Future Consideration C.
Summary: incremental steps are additive
The work in #440 is not throwaway:
- The dictionary becomes the cache warm-up / fallback path
- The batch hydration pattern becomes the cache-miss handler
- The lookup/hydration separation provides the seam for future cache implementations
- Index improvements (Option C) benefit all paths regardless of caching strategy
Prioritisation
| Item | Type | Effort | Impact | Dependency |
|---|---|---|---|---|
| MVO update conversion | Surgical | High | Moderate — dominates delta sync flush | Profiling data from real-world delta syncs |
| A: Full entity cache | Incremental | Moderate | Medium — eliminates PK loads on cache hit | Profiling data |
| B: MVO join-attribute cache | Incremental | Moderate | Medium-High — eliminates sync matching queries | Profiling data |
| C: In-memory Worker model | Architectural | High | Very High — eliminates virtually all read latency | Design phase |
References
- Parent plan: `docs/plans/done/WORKER_DATABASE_PERFORMANCE_OPTIMISATION.md`
- MVO creates plan: `docs/plans/doing/MVO_COPY_BINARY_PERSISTENCE.md`
- Completed: Worker database performance optimisation - surgical raw SQL for hot paths #338 (Phases 1–6 creates, closed)
- Related: Worker redesign: Extract pure domain engine + optimised data access (Option A) #394 (intra-phase parallelism, closed — introduced ParallelBatchWriter/COPY binary patterns)
- Related: perf: batch pre-fetch CSO external ID mappings during full import #440 (FindMatchingCso batch pre-fetch — provides the lookup/hydration seam described above)