-
Notifications
You must be signed in to change notification settings - Fork 0
perf: eliminate post-hoc cross-batch reference fixup by fixing container processing order #397
Description
Background
During a Full Import, JIM processes LDAP containers in database ID order. Because `OU=Entitlements` (groups) has a lower DB ID than `OU=Users`, groups are inserted in an earlier batch than the users they reference. The per-batch reference resolver runs after each batch save, but at that point user CSOs still have `Id == Guid.Empty` (not yet persisted), so their `ReferenceValueId` FK cannot be set.
To work around this, a post-hoc SQL UPDATE (`FixupCrossBatchReferenceIdsAsync`) runs after all create batches complete, joining `UnresolvedReferenceValue` against secondary external ID attribute values to resolve any remaining FK NULLs.
Problem
The post-hoc UPDATE is a large multi-table JOIN across the entire `ConnectedSystemObjectAttributeValues` table. On large imports (MediumLarge: ~5,273 CSOs, ~448k attribute values) it was timing out against a cold fresh database with the default 30s command timeout. The fix was to extend the timeout to 300s — but this is a blunt instrument that does not scale well and provides no feedback to administrators when host resources are insufficient.
Two partial indexes were added to mitigate query performance, but the fundamental issue remains: the fixup should not be necessary at all.
Preferred Fix
Sort containers by DB ID descending (or use a smarter ordering) before batching so that referenced objects (users) are inserted before the objects that reference them (groups). This would allow the existing per-batch resolver to handle all references inline, eliminating the need for `FixupCrossBatchReferenceIdsAsync` entirely.
Alternatively, process containers in two passes: first all non-reference-holding object types, then all reference-holding types. Or detect reference attributes during schema import and order containers accordingly.
Additional Considerations
For very large deployments (XLarge: 100k+ users), deployment documentation should note DB I/O requirements — similar to the existing note that XLarge requires 20+ GB host RAM.
Cleanup When This Is Implemented
Once the root cause is fixed, remove the band-aid:
- Remove the 300s command timeout from `FixupCrossBatchReferenceIdsAsync` in `ConnectedSystemRepository`
- Remove the `FixupCrossBatchReferenceIdsAsync` call in `SyncImportTaskProcessor`
- Remove the `FixupCrossBatchReferenceIdsAsync` method from the repository, application server, and interface
Indexes: The two partial indexes added as mitigation should be retained regardless — they benefit delta imports and any other reference resolution path, not just the post-hoc fixup. Do not remove them as part of this issue.
Acceptance Criteria
- Full Import of Scenario 8 MediumLarge completes without the post-hoc fixup query
- `FixupCrossBatchReferenceIdsAsync` is removed (call site, method, and interface declaration)
- The 300s command timeout override in `ConnectedSystemRepository` is removed
- No regression on scenarios with multiple containers containing the same object type
- Integration tests (Scenario 1, Scenario 2, Scenario 8) all pass