Skip to content

BUG: When fewer keepers are configured bootsrap silently succeeds. #255

@v0lkan

Description

@v0lkan

SPIKE Bootstrap does not validate that the number of configured keepers matches SPIKE_NEXUS_SHAMIR_SHARES. When fewer keepers are configured than shares, bootstrap silently succeeds but leaves some shards undistributed, potentially leaving the system in a
state where recovery is impossible.

Current Behavior

Case 1: keepers > shares - Handled correctly
SPIKE_NEXUS_SHAMIR_SHARES=3
SPIKE_NEXUS_KEEPER_PEERS="https://k1:8443,https://k2:8443,https://k3:8443,https://k4:8443"

  • RootShares() generates 3 shares
  • KeeperShare() fails for keeper 4: "no share found for keeper ID: '4'"
  • Bootstrap crashes with clear error

Case 2: keepers < shares - NOT handled
SPIKE_NEXUS_SHAMIR_SHARES=3
SPIKE_NEXUS_KEEPER_PEERS="https://k1:8443,https://k2:8443" # Only 2 keepers!

  • RootShares() generates 3 shares (IDs: 1, 2, 3)
  • BroadcastKeepers only iterates over 2 keepers
  • Shares 1 and 2 are distributed successfully
  • Share 3 is never distributed
  • Bootstrap "succeeds" silently

Why This Is a Problem

With threshold=2 and shares=3, losing one keeper should be survivable. But if only 2 shards were ever distributed:

  • Keeper 1 down → only 1 shard available → cannot recover root key
  • The operator believes they have fault tolerance, but they don't

This is a silent misconfiguration that only manifests during a disaster recovery scenario.

Expected Behavior

Bootstrap should validate at startup:

$ spike-bootstrap
FATAL: Keeper count mismatch. SPIKE_NEXUS_SHAMIR_SHARES=3 but only 2 keepers
       configured in SPIKE_NEXUS_KEEPER_PEERS. These values must match.

Suggested Implementation

Add validation in BroadcastKeepers before distributing shards:

func BroadcastKeepers(ctx context.Context, api *spike.API) {
    const fName = "BroadcastKeepers"

    validation.CheckContext(ctx, fName)

    keepers := env.KeepersVal()
    expectedShares := env.ShamirSharesVal()

    if len(keepers) != expectedShares {
        failErr := sdkErrors.ErrConfigMismatch.Clone()
        failErr.Msg = fmt.Sprintf(
            "keeper count mismatch: SPIKE_NEXUS_SHAMIR_SHARES=%d "+
            "but %d keepers configured in SPIKE_NEXUS_KEEPER_PEERS; "+
            "these values must match",
            expectedShares, len(keepers),
        )
        log.FatalErr(fName, *failErr)
        return
    }

    rs := state.RootShares()
    // ... rest of function
}

Files to Modify

  • app/bootstrap/internal/net/broadcast.go - Add validation before shard distribution
  • Possibly add ErrConfigMismatch sentinel to SDK if it doesn't exist

Related

This validation should also exist in SPIKE Nexus's SendShardsPeriodically function, which already has a partial check:

// app/nexus/internal/initialization/recovery/recovery.go:234
if len(keepers) < env.ShamirSharesVal() {
    failErr := *sdkErrors.ErrShamirNotEnoughShards.Clone()
    failErr.Msg = "not enough keepers configured"
    log.FatalErr(fName, failErr)
}
``

Note: Nexus uses < (less than) rather than != (not equal), which allows more keepers than shares. This may be intentional for Nexus (extra keepers as hot spares) but for bootstrap, an exact match seems more appropriate to avoid confusion.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions