Skip to content

Add webhook caching for Kanidm objects#491

Open
pando85 wants to merge 1 commit intomasterfrom
claude/webhook-kanidm-cache-01WcaxkuMhXPNnfBxAjWvBKX
Open

Add webhook caching for Kanidm objects#491
pando85 wants to merge 1 commit intomasterfrom
claude/webhook-kanidm-cache-01WcaxkuMhXPNnfBxAjWvBKX

Conversation

@pando85
Copy link
Owner

@pando85 pando85 commented Nov 23, 2025

Implement a caching system for the webhook to track Kanidm objects and sync internal objects when external replication is enabled.

Changes:

  • Add Kanidm CRD watcher to track cluster configurations
  • Create Kanidm client cache infrastructure for authenticated connections
  • Implement background task to sync internal entities (groups, persons, OAuth2 clients, service accounts) from Kanidm clusters with external replication enabled
  • Add entity tracking that maps entity names to their KanidmRef (cluster name and namespace)
  • Update validation logic to check both CRD-based objects and internally synced objects from replicated clusters
  • Add InternalEntityCache to track entities synced from external Kanidm clusters

The webhook now prevents duplicate entity creation by checking:

  1. Kubernetes CRD resources (existing behavior)
  2. Internal Kanidm entities synced from clusters with external replication (new behavior)

This ensures that entities replicated from external clusters cannot be accidentally recreated via CRDs, preventing conflicts in multi-cluster Kanidm deployments.

Implement a caching system for the webhook to track Kanidm objects and sync
internal objects when external replication is enabled.

Changes:
- Add Kanidm CRD watcher to track cluster configurations
- Create Kanidm client cache infrastructure for authenticated connections
- Implement background task to sync internal entities (groups, persons,
  OAuth2 clients, service accounts) from Kanidm clusters with external
  replication enabled
- Add entity tracking that maps entity names to their KanidmRef (cluster
  name and namespace)
- Update validation logic to check both CRD-based objects and internally
  synced objects from replicated clusters
- Add InternalEntityCache to track entities synced from external Kanidm
  clusters

The webhook now prevents duplicate entity creation by checking:
1. Kubernetes CRD resources (existing behavior)
2. Internal Kanidm entities synced from clusters with external replication
   (new behavior)

This ensures that entities replicated from external clusters cannot be
accidentally recreated via CRDs, preventing conflicts in multi-cluster
Kanidm deployments.
@pando85 pando85 force-pushed the claude/webhook-kanidm-cache-01WcaxkuMhXPNnfBxAjWvBKX branch from 44bde81 to ad988d6 Compare December 26, 2025 17:02
@pando85
Copy link
Owner Author

pando85 commented Feb 16, 2026

@forkline could you review this PR? Don't modify it

@forkline-bot
Copy link

forkline-bot bot commented Feb 16, 2026

Code Review: Add webhook caching for Kanidm objects

Overview

This PR implements a caching system for the webhook to track Kanidm objects and sync internal entities when external replication is enabled. The implementation prevents duplicate entity creation by checking both CRD-based objects and entities synced from external Kanidm clusters.

Architecture Assessment

Strengths:

  1. Clear separation of concerns - New modules (state.rs, sync.rs, kanidm_client.rs, validator.rs) are well-organized
  2. Proper async patterns - Uses RwLock for concurrent read access and Mutex for client creation locking
  3. Comprehensive entity coverage - Handles all entity types (groups, persons, OAuth2 clients, service accounts)
  4. Graceful cleanup - Properly clears cache when Kanidm clusters are deleted or lose external replication config

Issues & Suggestions

1. Potential Resource Leak in Sync Loop (sync.rs:140-178)

The periodic sync loop spawns indefinitely but has no cancellation mechanism when the webhook shuts down:

loop {
    interval.tick().await;
    // ... no shutdown check
}

Suggestion: Add a cancellation token or use tokio::select! with a shutdown signal.

2. No Error Propagation for Sync Failures (sync.rs:117-137)

The initial sync retries forever on failure but never surfaces critical errors (e.g., auth failures):

loop {
    match sync_kanidm_entities_once(&state, &kanidm).await {
        Ok(_) => { break; }
        Err(e) => {
            warn!(...); // Only logged, no metrics or alerts
            time::sleep(Duration::from_secs(30)).await;
        }
    }
}

Suggestion: Consider adding metrics for sync failures and a max retry limit with exponential backoff.

3. Hardcoded Sync Intervals (sync.rs:116, 140)

  • Initial delay: 5 seconds
  • Resync interval: 300 seconds (5 minutes)

These should be configurable via environment variables or CRD annotations.

4. Double-Check Pattern Implementation (kanidm_client.rs:99-104)

The double-check pattern after acquiring the mutex is correct, but the client_creation_locks map grows unbounded. Old locks are only cleaned up on explicit remove_client() calls.
Suggestion: Consider using DashMap or periodic cleanup of unused locks.

5. Cache Consistency on Entity Deletion (state.rs:143-167)

clear_kanidm() iterates all entity maps and filters entries. For large caches, this could be slow. Consider maintaining a reverse index from (kanidm_name, kanidm_namespace) to entity names.

6. Missing Unit Tests

The new modules lack comprehensive unit tests for:

  • InternalEntityCache operations
  • KanidmClientCache client lifecycle
  • sync_kanidm_entities_once entity extraction logic
  • check_duplicate_with_internal edge cases

Only validator.rs has basic tests for normalize_kanidm_ref.

7. Unsafe TLS Configuration (kanidm_client.rs:144)

.danger_accept_invalid_certs(true)

This is acceptable for internal cluster communication but should be documented or made configurable for production deployments that may use proper certificates.

8. Error Message Clarity (validator.rs:97-100)

return Err(format!(
    "{} '{}' already exists in Kanidm cluster {}/{} (synced from external replication)",
    object_name, entity_name, ref_namespace, ref_name
));

The namespace/name order in the message ({ref_namespace}/{ref_name}) is inconsistent with Kubernetes convention which is usually {namespace}/{name}.

Minor Nits

  1. sync.rs:215-223: Repeated pattern for extracting entity name - could be extracted to a helper function:
if let Some(name_attr) = group.attrs.get("name") {
    if let Some(group_name) = name_attr.first() {
        // ...
    }
}
  1. state.rs:17: Type alias EntityRefs could be a struct for better self-documentation.

  2. Consider adding metrics for:

    • Cache hit/miss rates
    • Sync duration and success/failure rates
    • Number of cached entities per Kanidm cluster

Test Changes Note

The PR also includes significant refactoring of test files from jiff to chrono and removal of some test cases (group_account_policy, oauth2_secret_rotation, etc.). These changes should ideally be in a separate PR for clearer review history.

Verdict

The implementation is solid overall and follows project conventions. Addressing the sync loop lifecycle and adding unit tests would improve robustness. The feature is valuable for multi-cluster Kanidm deployments.

Recommendation: Request changes for items #1, #2, and #6 before merge.

@forkline-bot
Copy link

forkline-bot bot commented Feb 16, 2026

I've reviewed PR #491 and posted a detailed code review.

Summary:

The PR adds webhook caching for Kanidm objects to prevent duplicate entity creation in multi-cluster deployments with external replication. The architecture is sound with proper async patterns and separation of concerns.

Key issues identified:

  1. Sync loop has no shutdown mechanism (potential resource leak)
  2. No error propagation for sync failures
  3. Missing unit tests for the new modules
  4. Hardcoded sync intervals should be configurable
  5. Client creation locks map grows unbounded

Recommendation: Request changes for the sync loop lifecycle, error handling, and unit tests before merge.

Review posted: #491 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants