Skip to content

perf: eliminate post-hoc cross-batch reference fixup by fixing container processing order #397

@JayVDZ

Description

@JayVDZ

Background

During a Full Import, JIM processes LDAP containers in database ID order. Because `OU=Entitlements` (groups) has a lower DB ID than `OU=Users`, groups are inserted in an earlier batch than the users they reference. The per-batch reference resolver runs after each batch save, but at that point user CSOs still have `Id == Guid.Empty` (not yet persisted), so their `ReferenceValueId` FK cannot be set.

To work around this, a post-hoc SQL UPDATE (`FixupCrossBatchReferenceIdsAsync`) runs after all create batches complete, joining `UnresolvedReferenceValue` against secondary external ID attribute values to resolve any remaining FK NULLs.

Problem

The post-hoc UPDATE is a large multi-table JOIN across the entire `ConnectedSystemObjectAttributeValues` table. On large imports (MediumLarge: ~5,273 CSOs, ~448k attribute values) it was timing out against a cold fresh database with the default 30s command timeout. The fix was to extend the timeout to 300s — but this is a blunt instrument that does not scale well and provides no feedback to administrators when host resources are insufficient.

Two partial indexes were added to mitigate query performance, but the fundamental issue remains: the fixup should not be necessary at all.

Preferred Fix

Sort containers by DB ID descending (or use a smarter ordering) before batching so that referenced objects (users) are inserted before the objects that reference them (groups). This would allow the existing per-batch resolver to handle all references inline, eliminating the need for `FixupCrossBatchReferenceIdsAsync` entirely.

Alternatively, process containers in two passes: first all non-reference-holding object types, then all reference-holding types. Or detect reference attributes during schema import and order containers accordingly.

Additional Considerations

For very large deployments (XLarge: 100k+ users), deployment documentation should note DB I/O requirements — similar to the existing note that XLarge requires 20+ GB host RAM.

Cleanup When This Is Implemented

Once the root cause is fixed, remove the band-aid:

  • Remove the 300s command timeout from `FixupCrossBatchReferenceIdsAsync` in `ConnectedSystemRepository`
  • Remove the `FixupCrossBatchReferenceIdsAsync` call in `SyncImportTaskProcessor`
  • Remove the `FixupCrossBatchReferenceIdsAsync` method from the repository, application server, and interface

Indexes: The two partial indexes added as mitigation should be retained regardless — they benefit delta imports and any other reference resolution path, not just the post-hoc fixup. Do not remove them as part of this issue.

Acceptance Criteria

  • Full Import of Scenario 8 MediumLarge completes without the post-hoc fixup query
  • `FixupCrossBatchReferenceIdsAsync` is removed (call site, method, and interface declaration)
  • The 300s command timeout override in `ConnectedSystemRepository` is removed
  • No regression on scenarios with multiple containers containing the same object type
  • Integration tests (Scenario 1, Scenario 2, Scenario 8) all pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions