-
Notifications
You must be signed in to change notification settings - Fork 26
Description
SPIKE Bootstrap does not validate that the number of configured keepers matches SPIKE_NEXUS_SHAMIR_SHARES. When fewer keepers are configured than shares, bootstrap silently succeeds but leaves some shards undistributed, potentially leaving the system in a
state where recovery is impossible.
Current Behavior
Case 1: keepers > shares - Handled correctly
SPIKE_NEXUS_SHAMIR_SHARES=3
SPIKE_NEXUS_KEEPER_PEERS="https://k1:8443,https://k2:8443,https://k3:8443,https://k4:8443"
- RootShares() generates 3 shares
- KeeperShare() fails for keeper 4: "no share found for keeper ID: '4'"
- Bootstrap crashes with clear error
Case 2: keepers < shares - NOT handled
SPIKE_NEXUS_SHAMIR_SHARES=3
SPIKE_NEXUS_KEEPER_PEERS="https://k1:8443,https://k2:8443" # Only 2 keepers!
- RootShares() generates 3 shares (IDs: 1, 2, 3)
- BroadcastKeepers only iterates over 2 keepers
- Shares 1 and 2 are distributed successfully
- Share 3 is never distributed
- Bootstrap "succeeds" silently
Why This Is a Problem
With threshold=2 and shares=3, losing one keeper should be survivable. But if only 2 shards were ever distributed:
- Keeper 1 down → only 1 shard available → cannot recover root key
- The operator believes they have fault tolerance, but they don't
This is a silent misconfiguration that only manifests during a disaster recovery scenario.
Expected Behavior
Bootstrap should validate at startup:
$ spike-bootstrap
FATAL: Keeper count mismatch. SPIKE_NEXUS_SHAMIR_SHARES=3 but only 2 keepers
configured in SPIKE_NEXUS_KEEPER_PEERS. These values must match.
Suggested Implementation
Add validation in BroadcastKeepers before distributing shards:
func BroadcastKeepers(ctx context.Context, api *spike.API) {
const fName = "BroadcastKeepers"
validation.CheckContext(ctx, fName)
keepers := env.KeepersVal()
expectedShares := env.ShamirSharesVal()
if len(keepers) != expectedShares {
failErr := sdkErrors.ErrConfigMismatch.Clone()
failErr.Msg = fmt.Sprintf(
"keeper count mismatch: SPIKE_NEXUS_SHAMIR_SHARES=%d "+
"but %d keepers configured in SPIKE_NEXUS_KEEPER_PEERS; "+
"these values must match",
expectedShares, len(keepers),
)
log.FatalErr(fName, *failErr)
return
}
rs := state.RootShares()
// ... rest of function
}
Files to Modify
- app/bootstrap/internal/net/broadcast.go - Add validation before shard distribution
- Possibly add ErrConfigMismatch sentinel to SDK if it doesn't exist
Related
This validation should also exist in SPIKE Nexus's SendShardsPeriodically function, which already has a partial check:
// app/nexus/internal/initialization/recovery/recovery.go:234
if len(keepers) < env.ShamirSharesVal() {
failErr := *sdkErrors.ErrShamirNotEnoughShards.Clone()
failErr.Msg = "not enough keepers configured"
log.FatalErr(fName, failErr)
}
``
Note: Nexus uses < (less than) rather than != (not equal), which allows more keepers than shares. This may be intentional for Nexus (extra keepers as hot spares) but for bootstrap, an exact match seems more appropriate to avoid confusion.