Skip to content

Conversation

@SkArchon
Copy link
Contributor

@SkArchon SkArchon commented Jan 9, 2026

Summary by CodeRabbit

  • New Features

    • CDN-based cache warmup source added
    • In-memory switchover fallback for plan cache and persistence across restarts
  • Configuration

    • New in_memory_fallback option for cache warmup
    • New CDN source enable/disable configuration
  • Tests

    • Expanded test coverage for cache warmup, switchover fallback, config reloads, and restart scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

Description

This PR introduces a new mechanism to warm the cache on restarts due to schema changes (or whenever the router execution config changes). Config to enable this

cache_warmup:
  enabled: true
  in_memory_fallback: true

If in_memory_switchover is not present or false, we default to the current cdn cache warmer (with cosmo), this preserves backwards compatibility.

This is achieved by first adding a new switchoverConfig which can keep references across restarts, we use this to store a reference to the planner cache. When the router restarts due to an execution config change, we then loop through the elements in the ristretto planner cache.

Right now ristretto does not support iterating, so we have opened a PR here dgraph-io/ristretto#475. For this PR (which we need to discuss before merging) we have pointed ristretto to the fork from where the PR was opened.

We create a new provider for the cache warmer, for in_memory_switcher which extracts the queries from the planner cache upon startup. Note that with this method there is no cache warming upon startup, only on subsequent restarts due to execution config changes.

We also support persisting across restarts on hot config reloading. However we ran into a problem of shutdown occuring first before the new graph mux was created (which is the opposite for schema change updates), which means that the current planner cache gets emptied before we can extract the queries from it. As a solution we preemptively extract the queries from the planner cache before its closed whenever a hot config reloading occurs.

This PR also changes how cache warmer is enabled when used with the cosmo CDN, previously it requested the manifest from the CDN if the cache warmer was enabled in the router config.yaml, this meant that ex-cache warmer customers could still access old manifest files fron the cdn (as long as it was present), however now its only enabled if the feature is enabled in cosmo (upon startup we get if cache warmer is enabled from cosmo).

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@github-actions github-actions bot added the router label Jan 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Walkthrough

Adds an in-memory switchover/cache-warmup subsystem, propagates SwitchoverConfig through router initialization and graph construction, extends operation planner to optionally store operation content, adds CDN and in-memory-fallback cache-warmup sources, and adds extensive tests for cache warmup and switchover behavior across restarts.

Changes

Cohort / File(s) Summary
Switchover config & runtime
router/core/restart_switchover_config.go, router/core/router.go, router/core/supervisor.go, router/core/supervisor_instance.go
Add SwitchoverConfig and InMemoryPlanCacheFallback; constructors and methods (NewSwitchoverConfig, UpdateSwitchoverConfig, OnRouterConfigReload, CleanupFeatureFlags); wire SwitchoverConfig into Router, RouterResources, supervisor, and options pipeline; add WithSwitchoverConfig option.
Cache warmup sources & fallback
router/core/cache_warmup.go, router/core/cache_warmup_plans.go
Introduce PlanSource (pre-supplied operations) and add fallbackSource to cache warmup: attempt fallback when primary returns zero items or errors; add wiring and logging for fallback usage.
Graph server & planner integration
router/core/graph_server.go, router/core/operation_planner.go
Propagate SwitchoverConfig through BuildGraphMuxOptions and buildMultiGraphHandler; add switchover-gated in-memory plan cache fallback usage; extend OperationPlanner to optionally store operation content via operationPlannerOpts and add IterValues to cache interface.
Router cmd/init wiring
router/cmd/main.go
Initialize and pass SwitchoverConfig into RouterSupervisorOpts via core.NewSwitchoverConfig.
Configuration schema & defaults
router/pkg/config/config.go, router/pkg/config/config.schema.json, router/pkg/config/testdata/config_defaults.json, router/pkg/config/testdata/config_full.json, router/core/router_config.go
Add CacheWarmupCDNSource with enabled flag and new in_memory_fallback boolean; update schema and testdata JSON; expose usage field for in_memory_fallback.
Tests
router-tests/cache_warmup_test.go, router/core/restart_switchover_config_test.go
Add extensive cache-warmup and switchover tests: in-memory switchover caching scenarios, CDN/fallback edge cases, config reload and restart processing tests for in-memory plan cache.
Dependency bumps
router/go.mod, router-tests/go.mod
Bump github.com/dgraph-io/ristretto/v2 from v2.1.0 to v2.4.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • feat: circuit breaker implementation #1929: Modifies BuildGraphMuxOptions and buildMultiGraphHandler call paths—overlaps with this PR's changes that propagate additional config (SwitchoverConfig) through graph mux construction.
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: implementing cache rewarming from the planner cache when a schema change triggers a restart.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-220c08138dfe7af22c90a6783dcaeb75ff604ac5

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 87.91946% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.12%. Comparing base (7fed361) to head (4ef640c).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
router/core/cache_warmup.go 76.92% 2 Missing and 1 partial ⚠️
router/core/restart_switchover_config.go 96.10% 3 Missing ⚠️
router/core/router.go 57.14% 3 Missing ⚠️
router/core/supervisor_instance.go 0.00% 3 Missing ⚠️
router/cmd/main.go 0.00% 2 Missing ⚠️
router/core/cache_warmup_plans.go 75.00% 1 Missing and 1 partial ⚠️
router/core/graph_server.go 91.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2445      +/-   ##
==========================================
- Coverage   61.41%   61.12%   -0.29%     
==========================================
  Files         229      231       +2     
  Lines       23855    23955     +100     
==========================================
- Hits        14650    14642       -8     
- Misses       7971     8068      +97     
- Partials     1234     1245      +11     
Files with missing lines Coverage Δ
router/core/operation_planner.go 68.00% <100.00%> (+1.33%) ⬆️
router/core/router_config.go 93.71% <100.00%> (+0.03%) ⬆️
router/core/supervisor.go 60.97% <100.00%> (+1.48%) ⬆️
router/pkg/config/config.go 80.51% <ø> (ø)
router/cmd/main.go 0.00% <0.00%> (ø)
router/core/cache_warmup_plans.go 75.00% <75.00%> (ø)
router/core/graph_server.go 80.50% <91.66%> (-1.71%) ⬇️
router/core/cache_warmup.go 88.88% <76.92%> (+0.22%) ⬆️
router/core/restart_switchover_config.go 96.10% <96.10%> (ø)
router/core/router.go 71.51% <57.14%> (+1.07%) ⬆️
... and 1 more

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SkArchon SkArchon marked this pull request as ready for review January 14, 2026 16:15
@SkArchon SkArchon marked this pull request as draft January 14, 2026 16:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
router/go.mod (2)

175-199: Fix inconsistent ristretto/v2 fork across multi-module repo

The router module pins a fork (github.com/wundergraph/ristretto/v2 v2.0.0-20260113151003-f7231509ad4d), but router-tests, demo, and graphqlmetrics modules all require the upstream version (v2.1.0). This creates mixed implementations at build time. The replace directive in router/go.mod only affects that module's dependencies, not the workspace. Either apply the same replace directive to other go.mod files that depend on ristretto, or unify the version across all modules.


59-88: Add explanatory comment about the Ristretto fork in go.mod

The fork github.com/wundergraph/ristretto/v2 at commit f7231509ad4d (branch milinda/iterating_elements) adds iterator support for cache elements, which is not available in the upstream v2.0.0 release. Add a comment in the replace directive explaining this purpose. Since no other modules in the repo depend on ristretto v2.1.x features (the require block specifies only v2.0.0), there are no compatibility concerns with using the fork.

🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 1100-1106: The test uses require.EventuallyWithT with an
assert.CollectT callback but calls require.Equal inside the retry closure (in
the block passed to require.EventuallyWithT) which will abort immediately;
replace the failing assertions inside that callback with assert.Equal so the
checks are retried: locate the closure passed to require.EventuallyWithT (where
res = xEnv.MakeGraphQLRequestOK(...)) and change require.Equal(t, "updated",
res.Response.Header.Get("X-Router-Config-Version")) and require.Equal(t, "HIT",
res.Response.Header.Get("x-wg-execution-plan-cache")) to assert.Equal(t,
"updated", ...) and assert.Equal(t, "HIT", ...).

In `@router/core/http_server.go`:
- Line 39: Remove the dead field from the httpServerOptions struct:
`switchoverConfig` is never set when creating `httpServerOptions` and is not
used by `newServer()`, since `SwitchoverConfig` is handled at the `Router` level
and passed into `newGraphServer`; delete the `switchoverConfig
*SwitchoverConfig` field from the `httpServerOptions` definition and update any
references (none expected) to rely on the Router/newGraphServer flow instead.

In `@router/core/restart_switchover_config.go`:
- Around line 24-38: UpdateSwitchoverConfig writes
s.inMemorySwitchOverCache.enabled without synchronization which races with reads
in getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags; fix by
acquiring the SwitchoverConfig write lock (e.g., s.mu or the existing cache
lock) around the block that sets s.inMemorySwitchOverCache.enabled and mutates
queriesForFeatureFlag/testValue, and change the reader methods
getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags to check
enabled while holding the same lock (or alternatively convert enabled to an
atomic.Bool and use atomic load/store for reads/writes) so readers and writers
are consistent.

In `@router/core/router.go`:
- Around line 1177-1181: Fix the typo in the inline comment near the switchover
config initialization: change "sueprvisor" to "supervisor" in the comment that
precedes the r.switchoverConfig = NewSwitchoverConfig() block so the comment
reads correctly while leaving the surrounding logic (r.switchoverConfig,
NewSwitchoverConfig(), UpdateSwitchoverConfig(&r.Config)) unchanged.
🧹 Nitpick comments (4)
router/pkg/config/config.go (1)

974-986: Consider making CacheWarmupSource.InMemorySwitchover a pointer to preserve “unset vs disabled”
At Line 974-977, InMemorySwitchover is a non-pointer struct with omitempty. Depending on the marshaler, this may still serialize as present (and it definitely can’t represent “not provided” distinctly from “provided but disabled”), which tends to leak into golden testdata and example configs (as seen in config_defaults.json).

If you want cache_warmup.source to be truly optional/empty unless explicitly configured, consider changing it to a pointer like *CacheWarmupInMemorySwitchover.

Proposed change
 type CacheWarmupSource struct {
 	Filesystem         *CacheWarmupFileSystemSource  `yaml:"filesystem,omitempty"`
-	InMemorySwitchover CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
+	InMemorySwitchover *CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
 }
router/pkg/config/config.schema.json (1)

2387-2423: Schema polish: add default: false + tighten wording for in_memory_switchover.enabled
For the new block at Line 2404-2413:

  • enabled has no default in the schema (even though the Go config uses envDefault:"false"). Consider adding "default": false for consistency with other booleans.
  • Minor: description “Is in memory switchover enabled” could be normalized (and maybe mention it reuses prior process’ cached plans, if that’s the intent).
router/core/cache_warmup_plans.go (1)

16-30: Consider preserving additional operation metadata for complete cache warmup.

The current implementation only extracts the Query content from cached plans. For accurate cache warmup, operations may also need variables, operation name, and headers that were part of the original request. If planWithMetaData stores additional context, consider extracting it here.

Also, based on the past discussion about limiting cache warmer operations to ~100, this implementation iterates unboundedly. Consider adding a limit if the plan cache can grow large.

router/core/restart_switchover_config.go (1)

44-49: Consider documenting or removing the testValue field.

The testValue field appears to be for testing purposes but is not documented. If it's only used in tests, consider adding a comment explaining its purpose, or moving test-specific logic elsewhere.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08bcfb5 and d576e03.

⛔ Files ignored due to path filters (1)
  • router/go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • router-tests/cache_warmup_test.go
  • router/cmd/main.go
  • router/core/cache_warmup_plans.go
  • router/core/graph_server.go
  • router/core/http_server.go
  • router/core/operation_planner.go
  • router/core/restart_switchover_config.go
  • router/core/router.go
  • router/core/router_config.go
  • router/core/supervisor.go
  • router/core/supervisor_instance.go
  • router/go.mod
  • router/pkg/config/config.go
  • router/pkg/config/config.schema.json
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/config/testdata/config_full.json
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/supervisor.go
  • router/cmd/main.go
  • router/core/router.go
  • router-tests/cache_warmup_test.go
  • router/core/restart_switchover_config.go
  • router/core/supervisor_instance.go
  • router/core/graph_server.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

  • router/core/supervisor.go
  • router/core/http_server.go
  • router/core/router_config.go
  • router/core/cache_warmup_plans.go
  • router/pkg/config/config.go
  • router/cmd/main.go
  • router/core/router.go
  • router-tests/cache_warmup_test.go
  • router/core/restart_switchover_config.go
  • router/core/supervisor_instance.go
  • router/core/operation_planner.go
  • router/core/graph_server.go
📚 Learning: 2025-08-12T13:50:45.964Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2132
File: router-plugin/plugin.go:139-146
Timestamp: 2025-08-12T13:50:45.964Z
Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger.

Applied to files:

  • router/cmd/main.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/cache_warmup_test.go
  • router/go.mod
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (6)
router/core/supervisor.go (2)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router.go (1)
  • Router (84-94)
router/core/http_server.go (1)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/cmd/main.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
router/core/supervisor_instance.go (3)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router_config.go (1)
  • Config (49-144)
router/core/router.go (2)
  • Option (174-174)
  • WithSwitchoverConfig (2057-2061)
router/core/graph_server.go (4)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router.go (1)
  • Router (84-94)
router/core/operation_planner.go (1)
  • NewOperationPlanner (50-57)
router/core/cache_warmup_plans.go (1)
  • NewPlanSource (16-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (36)
router/cmd/main.go (1)

152-172: SwitchoverConfig wiring looks correct; ensure lifecycle matches intended cache persistence scope
Line 152-155 passes a single core.NewSwitchoverConfig() into the supervisor—so caches/state will live for the process lifetime across reloads (likely desired for “reprocess existing plans”). Just ensure this is the intended scope (vs “per reload”) and that the underlying cache is concurrency-safe.

router/pkg/config/testdata/config_defaults.json (1)

311-322: config_defaults.json is a golden test output file, not a configuration subject to schema validation—no changes needed

The concern about schema mismatch is based on a misunderstanding of the file's purpose. config_defaults.json is purely a golden test artifact in TestDefaults() used to verify the JSON serialization of the Config struct; it is never loaded from disk or validated against config.schema.json.

The schema validation in LoadConfig() applies only to YAML input files. When the Config struct is marshaled to JSON for the golden test, Go's default JSON marshaling uses PascalCase field names (Filesystem, InMemorySwitchover) and includes null values where structs have omitempty YAML tags. This serialization format is independent of the schema, which constrains YAML input with snake_case keys. The presence of "Filesystem": null in the output is expected and correct.

router/pkg/config/testdata/config_full.json (1)

660-671: LGTM!

The InMemorySwitchover configuration block is correctly added under CacheWarmup.Source, following the existing pattern with Filesystem. The default Enabled: false is appropriate for this comprehensive config test file.

router/core/supervisor.go (3)

32-36: LGTM!

The SwitchoverConfig field is correctly added to RouterResources, enabling the switchover configuration to be passed through to downstream components like the Router.


38-44: LGTM!

The SwitchoverConfig field in RouterSupervisorOpts provides the input path for the switchover configuration.


52-55: LGTM!

The initialization correctly propagates SwitchoverConfig from options to resources, completing the wiring through the supervisor layer.

router-tests/cache_warmup_test.go (5)

5-6: LGTM!

The new imports are appropriate for the added test functionality: assert for EventuallyWithT callbacks and configpoller for the mock configuration poller.


919-971: LGTM - well-structured test for in-memory switchover caching.

The test correctly validates that:

  1. Initial request results in a cache MISS
  2. After config update (simulating switchover), the same query results in a cache HIT

This confirms the in-memory plan cache is preserved across config reloads.


973-1014: LGTM!

Correctly tests that cache is not preserved when cache warmup is entirely disabled, resulting in MISS on both requests.


1016-1062: LGTM!

Correctly tests that the default cache warmer (with InMemorySwitchover.Enabled = false) does not preserve plans across config reloads.


1064-1108: Both writeTestConfig (defined in router-tests/config_hot_reload_test.go:579) and ConfigPollerMock (defined in router-tests/config_hot_reload_test.go:32) are properly defined in the test package and are accessible to this test file.

router/core/router_config.go (1)

314-321: LGTM - source classification logic is correct and type-safe.

The addition correctly classifies the cache warmup source. The control flow properly prioritizes Filesystem over InMemorySwitchover. Since InMemorySwitchover is a value type (not a pointer), accessing its Enabled field is safe—no nil check is needed.

router/core/cache_warmup_plans.go (1)

32-34: LGTM!

Simple and efficient - queries are extracted once during construction, allowing the source cache to be garbage collected.

router/core/supervisor_instance.go (3)

53-53: LGTM!

Correctly threads SwitchoverConfig through the router option construction.


184-184: LGTM!

Function signature updated to accept switchoverConfig parameter for propagation.


275-275: LGTM!

WithSwitchoverConfig option correctly added to the options list.

router/core/router.go (4)

93-93: LGTM!

Field addition to hold switchover configuration state.


600-600: LGTM!

Correctly passes switchoverConfig to newGraphServer for plan cache management.


608-610: LGTM!

Feature flag cleanup after server swap ensures unused caches are pruned.


2057-2061: LGTM!

Standard option pattern for setting switchover configuration.

router/core/restart_switchover_config.go (2)

71-78: Clarify behavior when FeatureFlagConfigs is nil.

When routerCfg.FeatureFlagConfigs is nil, the method returns early without cleaning up any cached feature flags. This means stale feature flag caches could persist. Is this intentional?

If feature flags can be completely removed from a config, consider clearing all non-base caches when FeatureFlagConfigs is nil.


83-96: LGTM!

The cleanup logic correctly preserves the base configuration (empty string key) while removing caches for feature flags that no longer exist in the router config.

router/core/operation_planner.go (8)

24-24: LGTM!

New content field enables storing the original query for cache warmup scenarios.


32-37: LGTM!

The storeContent flag and operationPlannerOpts struct provide clean control over content storage behavior.


50-56: LGTM!

Constructor correctly accepts and stores the storeContent flag.


59-59: LGTM!

Signature update to accept operationPlannerOpts for controlling plan preparation behavior.


93-95: LGTM!

Content is conditionally stored based on the storeContent option, avoiding unnecessary memory usage when content storage is not needed.


122-122: LGTM!

When skipping cache (for tracing or query plan inclusion), content storage is disabled since these plans are not cached.


150-150: LGTM!

Normal cache path correctly uses the planner's storeContent setting.


44-46: The Iter method is already implemented by ristretto.Cache, which is the only ExecutionPlanCache implementation used in the codebase. It's already being called in router/core/cache_warmup_plans.go:21 without issues. Adding Iter to the interface is not a breaking change.

Likely an incorrect or invalid review comment.

router/core/graph_server.go (6)

108-116: LGTM!

The addition of SwitchoverConfig to BuildGraphMuxOptions follows the existing pattern for optional configuration parameters.


122-123: LGTM!

The signature extension to accept switchoverConfig is correctly propagated through the initialization path.


434-458: LGTM!

The switchoverConfig is correctly threaded through to feature flag mux construction, ensuring consistent behavior across base and feature flag graphs.


526-532: LGTM!

Field alignment changes are cosmetic and improve readability.


1303-1304: No nil pointer dereference risk. SwitchoverConfig and inMemorySwitchOverCache are guaranteed non-nil by initialization.

The opts.SwitchoverConfig parameter passed to newGraphServer is guaranteed to be non-nil with a valid inMemorySwitchOverCache because:

  1. Before newServer() calls newGraphServer() at line 600, the Router's Start() method initializes r.switchoverConfig with a nil check at lines 1177-1179.
  2. NewSwitchoverConfig() always returns a SwitchoverConfig with an initialized inMemorySwitchOverCache field (never nil).

Accessing opts.SwitchoverConfig.inMemorySwitchOverCache.enabled is safe; no defensive nil checks are needed.

Likely an incorrect or invalid review comment.


1358-1362: Thread safety is already properly implemented; remove this concern.

The getPlanCacheForFF and setPlanCacheForFF methods properly use sync.RWMutex (RLock/RUnlock for reads, Lock/Unlock for writes with defers) to synchronize access to the shared cache map. The nil safety concern about opts.SwitchoverConfig is not an issue—line 1304 in the same function already accesses opts.SwitchoverConfig.inMemorySwitchOverCache.enabled without guards, meaning the parameter is guaranteed to be non-nil by design.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@SkArchon SkArchon marked this pull request as ready for review January 14, 2026 16:49
@SkArchon SkArchon marked this pull request as draft January 14, 2026 16:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@router-tests/go.mod`:
- Line 207: The go.mod contains a replace directive pointing to the non-public
fork "github.com/wundergraph/ristretto/v2" which breaks module resolution;
remove that replace and switch to the upstream module
"github.com/dgraph-io/ristretto/v2" (or make the fork public if you must keep
it). Concretely, delete the replace entry that maps
"github.com/wundergraph/ristretto/v2 => github.com/wundergraph/ristretto/v2
v2.0.0-..." and ensure the project requires "github.com/dgraph-io/ristretto/v2"
(update to v2.1.0 if needed), then run "go mod tidy" to refresh the dependency
graph and verify builds/CI pass.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d576e03 and 78fc67d.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • router-tests/cache_warmup_test.go
  • router-tests/go.mod
  • router/core/restart_switchover_config.go
  • router/core/restart_switchover_config_test.go
  • router/core/router.go
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
  • router-tests/cache_warmup_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router/core/restart_switchover_config.go
  • router-tests/cache_warmup_test.go
  • router/core/router.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router-tests/cache_warmup_test.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router/core/restart_switchover_config.go
  • router-tests/cache_warmup_test.go
  • router/core/router.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-09-29T10:28:07.122Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.

Applied to files:

  • router/core/router.go
🧬 Code graph analysis (3)
router/core/restart_switchover_config_test.go (2)
router/core/restart_switchover_config.go (2)
  • InMemorySwitchOverCache (32-36)
  • NewSwitchoverConfig (18-22)
router/pkg/config/config.go (2)
  • CacheWarmupConfiguration (989-995)
  • CacheWarmupInMemorySwitchover (983-985)
router-tests/cache_warmup_test.go (5)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (292-349)
  • RouterConfig (246-249)
  • Environment (1735-1771)
router/pkg/config/config.go (5)
  • Config (1037-1112)
  • CacheWarmupConfiguration (989-995)
  • CacheWarmupSource (974-977)
  • CacheWarmupInMemorySwitchover (983-985)
  • ExecutionConfig (865-869)
router/core/router.go (3)
  • Option (174-174)
  • WithCacheWarmupConfig (2114-2118)
  • ExecutionConfig (159-163)
router/core/cache_warmup.go (1)
  • CacheWarmupSource (26-28)
router/pkg/controlplane/configpoller/config_poller.go (1)
  • ConfigPoller (20-29)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
🔇 Additional comments (9)
router/core/restart_switchover_config_test.go (1)

1-337: Test coverage looks comprehensive.

The test file provides thorough coverage of the SwitchoverConfig and InMemorySwitchOverCache functionality, including:

  • Enable/disable state transitions
  • Edge cases (nil config, already enabled/disabled)
  • Feature flag cache operations (get/set)
  • Cleanup of unused feature flags
  • Delegation from SwitchoverConfig to InMemorySwitchOverCache
router/core/restart_switchover_config.go (1)

11-22: LGTM for structure and initialization.

The type definitions and constructor are well-structured. The planCache type alias improves readability, and NewSwitchoverConfig correctly initializes with a disabled cache state.

router/core/router.go (4)

93-93: LGTM for field addition.

The switchoverConfig field is properly added to the Router struct to maintain state across config changes.


599-611: Clean integration of switchover config in server creation.

The switchover config is correctly passed to newGraphServer and CleanupFeatureFlags is appropriately called after the server swap to prune stale feature flag caches.


1177-1181: LGTM for initialization logic.

The lazy initialization of switchoverConfig for test scenarios is appropriate, and the comment explaining this is helpful (and the typo from the previous review has been fixed).


2057-2061: LGTM for new option function.

The WithSwitchoverConfig option follows the established pattern for router configuration options.

router-tests/cache_warmup_test.go (3)

1100-1106: Correct usage of assert inside EventuallyWithT callback.

Using assert.Equal instead of require.Equal inside the EventuallyWithT callback is correct—this allows the retry mechanism to work properly by collecting failures rather than immediately aborting.


1064-1108: No action needed. The writeTestConfig helper function is defined in router-tests/config_hot_reload_test.go at line 579 and is accessible from the cache_warmup_test.go file since both are in the same package. The function is properly defined and available for use.


919-971: Code is correct as-is—no undefined types.

ConfigPollerMock and writeTestConfig are defined in config_hot_reload_test.go in the same package (integration), making them accessible throughout cache_warmup_test.go. The test implementation is valid.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 1118-1125: The test YAML uses the wrong key name
"in_memory_switchover_fallback" so the Config struct field (check the struct in
router/pkg/config/config.go—likely CacheWarmup or similar with tag
`in_memory_fallback`) never gets populated; update the test snippet in
router-tests/cache_warmup_test.go to use "in_memory_fallback: true" instead of
"in_memory_switchover_fallback: true" so unmarshaling populates the struct field
and the test exercises the intended behavior.

In `@router/core/cache_warmup_cdn.go`:
- Around line 79-80: The cache-warmup URL is incorrectly building operationsPath
with "eee.json"; update the literal filename to "operations.json" so
operationsPath := fmt.Sprintf("/%s/%s/cache_warmup/operations.json",
c.organizationID, c.federatedGraphID) — change the string in the operationsPath
assignment that uses c.organizationID and c.federatedGraphID to reference
"operations.json" instead of "eee.json".
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)

129-134: Preserve already-converted operation slices across reloads.
If extractQueriesAndOverridePlanCache runs twice, any entries already stored as []*nodev1.Operation are dropped, clearing cache warmup data for those feature flags. Use a type switch to keep converted entries intact.

🔧 Proposed fix
	switchoverMap := make(map[string]any)
	for k, v := range c.queriesForFeatureFlag {
-		if cache, ok := v.(planCache); ok {
-			switchoverMap[k] = convertToNodeOperation(cache)
-		}
+		switch cache := v.(type) {
+		case planCache:
+			switchoverMap[k] = convertToNodeOperation(cache)
+		case []*nodev1.Operation:
+			// Already converted, preserve as-is
+			switchoverMap[k] = cache
+		}
	}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 28-32: The test's fakeSelfRegister.Register returns a nil
*nodev1.RegistrationInfo and tests create typed-nil receivers, causing a nil
dereference when router reads
r.registrationInfo.AccountLimits.TraceSamplingRate; modify
fakeSelfRegister.Register to return a concrete &nodev1.RegistrationInfo with
sensible default AccountLimits (so callers never get a nil pointer) and update
test assignments to use a real value (e.g., impl := &fakeSelfRegister{} or
declare the interface with a non-nil implementation) instead of var impl
*fakeSelfRegister = nil to avoid typed-nil interface values.

@SkArchon SkArchon requested a review from StarpTech January 26, 2026 16:58
planCache ExecutionPlanCache[uint64, *planWithMetaData]
executor *Executor
trackUsageInfo bool
storeContent bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific e.g operationContent ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 120-136: The extractQueriesAndOverridePlanCache function currently
only handles values of type planCache and drops entries already converted to
[]*nodev1.Operation; update the loop in extractQueriesAndOverridePlanCache to
also detect and preserve already-converted entries (e.g., if v is of type
[]*nodev1.Operation or [](*nodev1.Operation) or the concrete slice type used) by
copying them directly into switchoverMap, and only call
convertToNodeOperation(cache) when v asserts to planCache; keep the existing
mutex logic and ensure the map type remains map[string]any to store both
converted slices and newly converted results.
🧹 Nitpick comments (1)
router/core/restart_switchover_config.go (1)

85-107: Potential performance concern: Long-running operation while holding read lock.

getPlanCacheForFF calls convertToNodeOperation(cache) at line 96 while holding the RLock. This function calls data.Wait() (blocking until buffered writes are flushed) and data.IterValues(...) (iterating over potentially many cache entries).

If the plan cache has many entries or pending writes, this could hold the read lock for an extended period, blocking writers like updateStateFromConfig or extractQueriesAndOverridePlanCache.

Consider extracting the planCache reference under the lock and performing the conversion outside:

Proposed improvement
 func (c *InMemorySwitchoverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation {
 	c.mu.RLock()
-	defer c.mu.RUnlock()
-
 	if c.queriesForFeatureFlag == nil {
+		c.mu.RUnlock()
 		return nil
 	}
 
-	switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {
+	data := c.queriesForFeatureFlag[featureFlagKey]
+	c.mu.RUnlock()
+
+	switch cache := data.(type) {
 	case planCache:
 		return convertToNodeOperation(cache)
 	case []*nodev1.Operation:
 		return cache
 	case nil:
 		return nil
 	default:
 		c.logger.Error("unexpected type")
 		return nil
 	}
 }

}
}

func WithSwitchoverConfig(cfg *SwitchoverConfig) Option {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is not clear. A switchover can mean anything. How about WithMemoryPlanCacheFallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think switchover should stay the same, since its goal is to just a struct which contains references to things that should persist across restarts.

I instead changed

type SwitchoverConfig struct {
	inMemorySwitchOverCache *InMemorySwitchoverCache
}

to

type SwitchoverConfig struct {
	inMemoryPlanCacheFallback *InMemoryPlanCacheFallback
}

},
"in_memory_fallback": {
"type": "boolean",
"description": "Enable in-memory fallback. When enabled, the router will cache query plans in memory and reuse them on schema changes and hot config reloads. This is useful when deploying new router configurations without downtime. The default value is true.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful when deploying new router configurations without downtime

This makes no sense. The cache will help with cold starts for operation planning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@SkArchon SkArchon requested a review from StarpTech January 31, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants