Skip to content

EPIC: Scoped SPIKE Pilot Instances for Multi-Tenancy #281

@v0lkan

Description

@v0lkan

Context

SPIKE currently operates with a single-tenant model where SPIKE Pilot has
unrestricted access to all secrets and policies. The Pilot's SPIFFE ID
(spiffe://<trustRoot>/spike/pilot/role/superuser) bypasses all policy checks,
giving the operator full visibility across the entire secret store.

This model works well for single-tenant deployments but creates challenges for
multi-tenant scenarios where different organizational units (tenants) should
have isolated administrative domains:

Current Limitations:

  1. No Administrative Isolation: A single operator sees all tenants' secrets
  2. No Delegated Administration: Cannot give tenant admins control over their
    scope without granting global access
  3. Blast Radius: Pilot credential compromise exposes all tenants
  4. Compliance: Some regulations require tenant data isolation at the
    administrative level

Current Multi-Tenancy Options:

Option Isolation Level Operational Overhead Limitations
Separate SPIKE deployments Strong High (N deployments) No central view
Path-based conventions Weak Low Pilot sees everything
Trust domain per tenant Strong Very High Complex SPIRE topology

This ADR proposes Scoped Pilot Instances as a fourth option that provides
administrative isolation within a single SPIKE deployment.

Proposal

Introduce Scoped Pilots: Pilot instances whose administrative access is
restricted to a specific path prefix (scope). Scoped Pilots can only view and
manage secrets and policies within their designated scope.

SPIFFE ID Structure

Scoped Pilots use an extended SPIFFE ID format that encodes the scope:

# Current superuser (unchanged, backward compatible)
spiffe://<trustRoot>/spike/pilot/role/superuser

# Scoped Pilot for tenant "pepsi"
spiffe://<trustRoot>/spike/pilot/scope/tenants/pepsi

# Scoped Pilot for tenant "coca"
spiffe://<trustRoot>/spike/pilot/scope/tenants/coca

# Scoped Pilot for environment "prod/us-west"
spiffe://<trustRoot>/spike/pilot/scope/environments/prod/us-west

The scope is extracted from the SPIFFE ID path after /spike/pilot/scope/.

Scope Enforcement Points

Scope restrictions are enforced at the Nexus API layer:

Operation Superuser Pilot Scoped Pilot
Secret Get All paths Only paths starting with scope
Secret Put All paths Only paths starting with scope
Secret List All paths Only paths starting with scope
Secret Delete All paths Only paths starting with scope
Policy Create All patterns PathPattern must match scope prefix
Policy List All policies Only policies with matching PathPattern
Policy Get All policies Only policies with matching PathPattern
Policy Delete All policies Only policies with matching PathPattern
Cipher Encrypt Yes Yes (not path-scoped)
Cipher Decrypt Yes Yes (not path-scoped)
Recovery Yes No (superuser only)
Restore Yes No (superuser only)

Scope Matching Rules

A scoped Pilot with scope S can access a resource with path P if and only if:

strings.HasPrefix(P, S) || P == S

For policies, the PathPattern must be "contained within" the scope:

# Scope: "tenants/pepsi"

# ALLOWED policy PathPatterns:
"^tenants/pepsi$"              # Exact match
"^tenants/pepsi/.*$"           # Subpaths
"^tenants/pepsi/db/.*$"        # Deeper subpaths

# DENIED policy PathPatterns:
"^tenants/.*$"                 # Too broad (includes other tenants)
"^tenants/coca/.*$"            # Different tenant
"^.*$"                         # Global pattern

Implementation Components

1. SPIFFE ID Parsing (spike-sdk-go/spiffeid/)

// ScopedPilotPrefix is the SPIFFE ID path prefix for scoped Pilots.
const ScopedPilotPrefix = "/spike/pilot/scope/"

// IsScopedPilot checks if a SPIFFE ID represents a scoped Pilot.
func IsScopedPilot(spiffeID string) bool {
    // Parse and check for /spike/pilot/scope/ prefix
}

// GetPilotScope extracts the scope from a scoped Pilot's SPIFFE ID.
// Returns empty string for superuser Pilots.
func GetPilotScope(spiffeID string) string {
    // Extract path after /spike/pilot/scope/
}

// IsPilotWithScope checks if the SPIFFE ID is a Pilot (scoped or superuser)
// and returns its scope (empty for superuser).
func IsPilotWithScope(spiffeID string) (isPilot bool, scope string) {
    if IsPilotOperator(spiffeID) {
        return true, "" // Superuser, no scope restriction
    }
    if IsScopedPilot(spiffeID) {
        return true, GetPilotScope(spiffeID)
    }
    return false, ""
}

2. Scope Enforcement (app/nexus/internal/state/base/)

Update CheckAccess to handle scoped Pilots:

func CheckAccess(
    peerSPIFFEID string, path string, wants []data.PolicyPermission,
) bool {
    // Superuser Pilot: unrestricted access (existing behavior)
    if spiffeid.IsPilotOperator(peerSPIFFEID) {
        return true
    }

    // Scoped Pilot: check scope before granting access
    if isPilot, scope := spiffeid.IsPilotWithScope(peerSPIFFEID); isPilot {
        if scope != "" && !strings.HasPrefix(path, scope) {
            return false // Path outside scope
        }
        return true // Within scope, access granted
    }

    // Regular workload: evaluate policies (existing behavior)
    // ...
}

3. Policy Scope Validation (app/nexus/internal/route/acl/policy/)

Add scope validation to policy creation:

func validatePolicyForScope(policy data.Policy, pilotScope string) error {
    if pilotScope == "" {
        return nil // Superuser, no restrictions
    }

    // Validate that PathPattern is contained within scope
    if !isPatternContainedInScope(policy.PathPattern, pilotScope) {
        return fmt.Errorf(
            "policy PathPattern %q exceeds scope %q",
            policy.PathPattern, pilotScope,
        )
    }
    return nil
}

// isPatternContainedInScope checks if a regex pattern only matches
// paths within the given scope prefix.
func isPatternContainedInScope(pattern, scope string) bool {
    // Pattern must start with literal scope prefix
    // e.g., for scope "tenants/pepsi", pattern must start with
    // "^tenants/pepsi" (with proper escaping)
}

4. List Filtering (app/nexus/internal/route/secret/, .../acl/policy/)

Filter list results by scope:

func filterSecretsForScope(paths []string, pilotScope string) []string {
    if pilotScope == "" {
        return paths // Superuser sees all
    }

    var filtered []string
    for _, p := range paths {
        if strings.HasPrefix(p, pilotScope) {
            filtered = append(filtered, p)
        }
    }
    return filtered
}

func filterPoliciesForScope(
    policies []data.Policy, pilotScope string,
) []data.Policy {
    if pilotScope == "" {
        return policies // Superuser sees all
    }

    var filtered []data.Policy
    for _, p := range policies {
        if isPatternContainedInScope(p.PathPattern, pilotScope) {
            filtered = append(filtered, p)
        }
    }
    return filtered
}

SPIRE Registration

Scoped Pilots are registered in SPIRE with their scope encoded in the SPIFFE ID:

# Register a scoped Pilot for the "pepsi" tenant
spire-server entry create \
    -spiffeID spiffe://example.org/spike/pilot/scope/tenants/pepsi \
    -parentID spiffe://example.org/spire/agent/... \
    -selector k8s:ns:pepsi-admin \
    -selector k8s:sa:spike-pilot

# Register a scoped Pilot for the "coca" tenant
spire-server entry create \
    -spiffeID spiffe://example.org/spike/pilot/scope/tenants/coca \
    -parentID spiffe://example.org/spire/agent/... \
    -selector k8s:ns:coca-admin \
    -selector k8s:sa:spike-pilot

Backward Compatibility

  • Existing superuser Pilots (/spike/pilot/role/superuser) continue to work
    unchanged
  • The IsPilotOperator() function remains unchanged
  • Deployments without scoped Pilots require no changes
  • Scoped Pilots are opt-in via SPIRE registration

Rationale

Why Encode Scope in SPIFFE ID?

Alternative 1: Scope Configuration in Nexus

Store scope-to-SPIFFE-ID mappings in Nexus configuration or database.

Rejected because:

  • Adds configuration complexity
  • Creates chicken-and-egg problem during bootstrap
  • Scope could be changed post-registration, creating confusion
  • Harder to audit (scope not visible in SPIFFE ID)

Alternative 2: Scope as SPIFFE ID Selector

Use SPIRE selectors to encode scope metadata.

Rejected because:

  • Selectors are not part of the SPIFFE ID itself
  • Would require Nexus to query SPIRE for scope information
  • Breaks the principle of self-describing identity

Chosen Approach Benefits:

  • Scope is cryptographically bound to identity (in the SVID)
  • Self-describing: scope is visible by inspecting the SPIFFE ID
  • No additional configuration or database required
  • Immutable: scope cannot be changed without re-registration
  • Auditable: scope appears in all logs containing the SPIFFE ID

Why Not Use Policies for Pilot Scoping?

One might suggest using the existing policy system to restrict Pilot access.

Rejected because:

  • Policies control workload access, not administrative access
  • Scoped Pilots need to create policies, creating circular dependency
  • Administrative scoping is a different concern than workload authorization
  • Would complicate the security model (Pilot both bypasses and is subject to
    policies)

Why Keep Recovery/Restore Global?

Recovery and restore operations remain restricted to superuser Pilots only.

Rationale:

  • Recovery affects the entire system, not individual tenants
  • Shamir shards are system-wide, not per-tenant
  • Disaster recovery is an operational concern, not a tenant concern
  • Per-ADR-0029, these operations require the highest privilege level

Scope Granularity

Scopes are path prefixes, not arbitrary patterns.

Rationale:

  • Simple to understand and implement
  • Hierarchical: tenants/pepsi/db is a sub-scope of tenants/pepsi
  • Predictable: easy to reason about what a scope includes
  • Pattern-based scopes would be complex and error-prone

Consequences

Positive

  • Administrative Isolation: Tenant admins cannot see other tenants' data
  • Delegated Administration: Can give tenant-specific administrative access
  • Reduced Blast Radius: Compromised scoped Pilot only affects one tenant
  • Single Deployment: Multi-tenancy without deployment duplication
  • Backward Compatible: Existing deployments unaffected
  • Auditable: Scope visible in SPIFFE ID for all audit trails
  • Immutable Scopes: Scope bound to SVID, cannot be escalated

Negative

  • No Cross-Scope Operations: Scoped Pilot cannot operate across tenants
  • Pattern Validation Complexity: Validating "pattern contained in scope"
    requires careful regex analysis
  • SPIRE Registration Overhead: Each scoped Pilot needs separate SPIRE entry
  • No Scope Hierarchy: A Pilot scoped to tenants/pepsi cannot delegate to
    tenants/pepsi/db

Neutral

  • Operational Model Change: Organizations must decide scope boundaries
  • Documentation Updates: New deployment patterns to document
  • Testing Surface: New code paths to test

Implementation Plan

Phase 1: Core Infrastructure

  1. Add IsScopedPilot() and GetPilotScope() to spike-sdk-go
  2. Update CheckAccess() to handle scoped Pilots
  3. Add scope validation to secret routes
  4. Add unit tests for scope enforcement

Phase 2: Policy Scoping

  1. Implement isPatternContainedInScope() validation
  2. Add scope validation to policy creation route
  3. Add scope filtering to policy list route
  4. Add integration tests for policy scoping

Phase 3: Documentation and Examples

  1. Document scoped Pilot SPIFFE ID format
  2. Add multi-tenancy deployment guide
  3. Add example SPIRE registration scripts
  4. Update security model documentation

Phase 4: Operational Tooling

  1. Add spike pilot scope command to show current scope
  2. Add scope information to audit logs
  3. Add metrics for per-scope operations

Security Considerations

Scope Escalation Prevention

  • Scope is encoded in SPIFFE ID, signed by SPIRE CA
  • Cannot be modified without new SVID issuance
  • SPIRE registration controls who gets which scope

Policy Pattern Validation

The isPatternContainedInScope() function is security-critical:

  • Must reject patterns that could match outside the scope
  • Conservative approach: reject ambiguous patterns
  • Consider using a regex analysis library for correctness

Example dangerous patterns to reject:

".*"                    # Matches everything
"tenants/(pepsi|coca)"  # Alternation escapes scope
"tenants/pepsi.*"       # Missing anchor, matches "tenants/pepsi-evil"

Cross-Tenant Information Leakage

  • List operations must filter results server-side
  • Error messages must not reveal existence of out-of-scope resources
  • Timing attacks: consider constant-time responses for denied requests

Open Questions

  1. Scope Hierarchy: Should tenants/pepsi Pilot be able to create a
    sub-scoped Pilot for tenants/pepsi/db? (ideally: No)

  2. Cipher Scoping: Should cipher operations be scoped? (possibly:
    No, encryption is not path-specific)

  3. Audit Log Access: Should scoped Pilots see audit logs for their scope?
    (Unclear, Future consideration)

  4. Scope Wildcards: Should scopes support patterns like tenants/*?
    (Ideally: No, keep scopes as literal prefixes---alternative can open a different kind of security can of worms)

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions