EPIC: Scoped SPIKE Pilot Instances for Multi-Tenancy

## Context

SPIKE currently operates with a single-tenant model where SPIKE Pilot has
unrestricted access to all secrets and policies. The Pilot's SPIFFE ID
(`spiffe://<trustRoot>/spike/pilot/role/superuser`) bypasses all policy checks,
giving the operator full visibility across the entire secret store.

This model works well for single-tenant deployments but creates challenges for
multi-tenant scenarios where different organizational units (tenants) should
have isolated administrative domains:

**Current Limitations:**

1. **No Administrative Isolation**: A single operator sees all tenants' secrets
2. **No Delegated Administration**: Cannot give tenant admins control over their
   scope without granting global access
3. **Blast Radius**: Pilot credential compromise exposes all tenants
4. **Compliance**: Some regulations require tenant data isolation at the
   administrative level

**Current Multi-Tenancy Options:**

| Option                     | Isolation Level | Operational Overhead | Limitations            |
|----------------------------|-----------------|----------------------|------------------------|
| Separate SPIKE deployments | Strong          | High (N deployments) | No central view        |
| Path-based conventions     | Weak            | Low                  | Pilot sees everything  |
| Trust domain per tenant    | Strong          | Very High            | Complex SPIRE topology |

This ADR proposes **Scoped Pilot Instances** as a fourth option that provides
administrative isolation within a single SPIKE deployment.

## Proposal

Introduce **Scoped Pilots**: Pilot instances whose administrative access is
restricted to a specific path prefix (scope). Scoped Pilots can only view and
manage secrets and policies within their designated scope.

### SPIFFE ID Structure

Scoped Pilots use an extended SPIFFE ID format that encodes the scope:

```
# Current superuser (unchanged, backward compatible)
spiffe://<trustRoot>/spike/pilot/role/superuser

# Scoped Pilot for tenant "pepsi"
spiffe://<trustRoot>/spike/pilot/scope/tenants/pepsi

# Scoped Pilot for tenant "coca"
spiffe://<trustRoot>/spike/pilot/scope/tenants/coca

# Scoped Pilot for environment "prod/us-west"
spiffe://<trustRoot>/spike/pilot/scope/environments/prod/us-west
```

The scope is extracted from the SPIFFE ID path after `/spike/pilot/scope/`.

### Scope Enforcement Points

Scope restrictions are enforced at the Nexus API layer:

| Operation      | Superuser Pilot | Scoped Pilot                            |
|----------------|-----------------|-----------------------------------------|
| Secret Get     | All paths       | Only paths starting with scope          |
| Secret Put     | All paths       | Only paths starting with scope          |
| Secret List    | All paths       | Only paths starting with scope          |
| Secret Delete  | All paths       | Only paths starting with scope          |
| Policy Create  | All patterns    | PathPattern must match scope prefix     |
| Policy List    | All policies    | Only policies with matching PathPattern |
| Policy Get     | All policies    | Only policies with matching PathPattern |
| Policy Delete  | All policies    | Only policies with matching PathPattern |
| Cipher Encrypt | Yes             | Yes (not path-scoped)                   |
| Cipher Decrypt | Yes             | Yes (not path-scoped)                   |
| Recovery       | Yes             | **No** (superuser only)                 |
| Restore        | Yes             | **No** (superuser only)                 |

### Scope Matching Rules

A scoped Pilot with scope `S` can access a resource with path `P` if and only if:

```
strings.HasPrefix(P, S) || P == S
```

For policies, the PathPattern must be "contained within" the scope:

```
# Scope: "tenants/pepsi"

# ALLOWED policy PathPatterns:
"^tenants/pepsi$"              # Exact match
"^tenants/pepsi/.*$"           # Subpaths
"^tenants/pepsi/db/.*$"        # Deeper subpaths

# DENIED policy PathPatterns:
"^tenants/.*$"                 # Too broad (includes other tenants)
"^tenants/coca/.*$"            # Different tenant
"^.*$"                         # Global pattern
```

### Implementation Components

#### 1. SPIFFE ID Parsing (spike-sdk-go/spiffeid/)

```go
// ScopedPilotPrefix is the SPIFFE ID path prefix for scoped Pilots.
const ScopedPilotPrefix = "/spike/pilot/scope/"

// IsScopedPilot checks if a SPIFFE ID represents a scoped Pilot.
func IsScopedPilot(spiffeID string) bool {
    // Parse and check for /spike/pilot/scope/ prefix
}

// GetPilotScope extracts the scope from a scoped Pilot's SPIFFE ID.
// Returns empty string for superuser Pilots.
func GetPilotScope(spiffeID string) string {
    // Extract path after /spike/pilot/scope/
}

// IsPilotWithScope checks if the SPIFFE ID is a Pilot (scoped or superuser)
// and returns its scope (empty for superuser).
func IsPilotWithScope(spiffeID string) (isPilot bool, scope string) {
    if IsPilotOperator(spiffeID) {
        return true, "" // Superuser, no scope restriction
    }
    if IsScopedPilot(spiffeID) {
        return true, GetPilotScope(spiffeID)
    }
    return false, ""
}
```

#### 2. Scope Enforcement (app/nexus/internal/state/base/)

Update `CheckAccess` to handle scoped Pilots:

```go
func CheckAccess(
    peerSPIFFEID string, path string, wants []data.PolicyPermission,
) bool {
    // Superuser Pilot: unrestricted access (existing behavior)
    if spiffeid.IsPilotOperator(peerSPIFFEID) {
        return true
    }

    // Scoped Pilot: check scope before granting access
    if isPilot, scope := spiffeid.IsPilotWithScope(peerSPIFFEID); isPilot {
        if scope != "" && !strings.HasPrefix(path, scope) {
            return false // Path outside scope
        }
        return true // Within scope, access granted
    }

    // Regular workload: evaluate policies (existing behavior)
    // ...
}
```

#### 3. Policy Scope Validation (app/nexus/internal/route/acl/policy/)

Add scope validation to policy creation:

```go
func validatePolicyForScope(policy data.Policy, pilotScope string) error {
    if pilotScope == "" {
        return nil // Superuser, no restrictions
    }

    // Validate that PathPattern is contained within scope
    if !isPatternContainedInScope(policy.PathPattern, pilotScope) {
        return fmt.Errorf(
            "policy PathPattern %q exceeds scope %q",
            policy.PathPattern, pilotScope,
        )
    }
    return nil
}

// isPatternContainedInScope checks if a regex pattern only matches
// paths within the given scope prefix.
func isPatternContainedInScope(pattern, scope string) bool {
    // Pattern must start with literal scope prefix
    // e.g., for scope "tenants/pepsi", pattern must start with
    // "^tenants/pepsi" (with proper escaping)
}
```

#### 4. List Filtering (app/nexus/internal/route/secret/, .../acl/policy/)

Filter list results by scope:

```go
func filterSecretsForScope(paths []string, pilotScope string) []string {
    if pilotScope == "" {
        return paths // Superuser sees all
    }

    var filtered []string
    for _, p := range paths {
        if strings.HasPrefix(p, pilotScope) {
            filtered = append(filtered, p)
        }
    }
    return filtered
}

func filterPoliciesForScope(
    policies []data.Policy, pilotScope string,
) []data.Policy {
    if pilotScope == "" {
        return policies // Superuser sees all
    }

    var filtered []data.Policy
    for _, p := range policies {
        if isPatternContainedInScope(p.PathPattern, pilotScope) {
            filtered = append(filtered, p)
        }
    }
    return filtered
}
```

### SPIRE Registration

Scoped Pilots are registered in SPIRE with their scope encoded in the SPIFFE ID:

```bash
# Register a scoped Pilot for the "pepsi" tenant
spire-server entry create \
    -spiffeID spiffe://example.org/spike/pilot/scope/tenants/pepsi \
    -parentID spiffe://example.org/spire/agent/... \
    -selector k8s:ns:pepsi-admin \
    -selector k8s:sa:spike-pilot

# Register a scoped Pilot for the "coca" tenant
spire-server entry create \
    -spiffeID spiffe://example.org/spike/pilot/scope/tenants/coca \
    -parentID spiffe://example.org/spire/agent/... \
    -selector k8s:ns:coca-admin \
    -selector k8s:sa:spike-pilot
```

### Backward Compatibility

- Existing superuser Pilots (`/spike/pilot/role/superuser`) continue to work
  unchanged
- The `IsPilotOperator()` function remains unchanged
- Deployments without scoped Pilots require no changes
- Scoped Pilots are opt-in via SPIRE registration

## Rationale

### Why Encode Scope in SPIFFE ID?

**Alternative 1: Scope Configuration in Nexus**

Store scope-to-SPIFFE-ID mappings in Nexus configuration or database.

*Rejected because:*
- Adds configuration complexity
- Creates chicken-and-egg problem during bootstrap
- Scope could be changed post-registration, creating confusion
- Harder to audit (scope not visible in SPIFFE ID)

**Alternative 2: Scope as SPIFFE ID Selector**

Use SPIRE selectors to encode scope metadata.

*Rejected because:*
- Selectors are not part of the SPIFFE ID itself
- Would require Nexus to query SPIRE for scope information
- Breaks the principle of self-describing identity

**Chosen Approach Benefits:**
- Scope is cryptographically bound to identity (in the SVID)
- Self-describing: scope is visible by inspecting the SPIFFE ID
- No additional configuration or database required
- Immutable: scope cannot be changed without re-registration
- Auditable: scope appears in all logs containing the SPIFFE ID

### Why Not Use Policies for Pilot Scoping?

One might suggest using the existing policy system to restrict Pilot access.

*Rejected because:*
- Policies control workload access, not administrative access
- Scoped Pilots need to *create* policies, creating circular dependency
- Administrative scoping is a different concern than workload authorization
- Would complicate the security model (Pilot both bypasses and is subject to
  policies)

### Why Keep Recovery/Restore Global?

Recovery and restore operations remain restricted to superuser Pilots only.

*Rationale:*
- Recovery affects the entire system, not individual tenants
- Shamir shards are system-wide, not per-tenant
- Disaster recovery is an operational concern, not a tenant concern
- Per-ADR-0029, these operations require the highest privilege level

### Scope Granularity

Scopes are path prefixes, not arbitrary patterns.

*Rationale:*
- Simple to understand and implement
- Hierarchical: `tenants/pepsi/db` is a sub-scope of `tenants/pepsi`
- Predictable: easy to reason about what a scope includes
- Pattern-based scopes would be complex and error-prone

## Consequences

### Positive

- **Administrative Isolation**: Tenant admins cannot see other tenants' data
- **Delegated Administration**: Can give tenant-specific administrative access
- **Reduced Blast Radius**: Compromised scoped Pilot only affects one tenant
- **Single Deployment**: Multi-tenancy without deployment duplication
- **Backward Compatible**: Existing deployments unaffected
- **Auditable**: Scope visible in SPIFFE ID for all audit trails
- **Immutable Scopes**: Scope bound to SVID, cannot be escalated

### Negative

- **No Cross-Scope Operations**: Scoped Pilot cannot operate across tenants
- **Pattern Validation Complexity**: Validating "pattern contained in scope"
  requires careful regex analysis
- **SPIRE Registration Overhead**: Each scoped Pilot needs separate SPIRE entry
- **No Scope Hierarchy**: A Pilot scoped to `tenants/pepsi` cannot delegate to
  `tenants/pepsi/db`

### Neutral

- **Operational Model Change**: Organizations must decide scope boundaries
- **Documentation Updates**: New deployment patterns to document
- **Testing Surface**: New code paths to test

## Implementation Plan

### Phase 1: Core Infrastructure

1. Add `IsScopedPilot()` and `GetPilotScope()` to spike-sdk-go
2. Update `CheckAccess()` to handle scoped Pilots
3. Add scope validation to secret routes
4. Add unit tests for scope enforcement

### Phase 2: Policy Scoping

1. Implement `isPatternContainedInScope()` validation
2. Add scope validation to policy creation route
3. Add scope filtering to policy list route
4. Add integration tests for policy scoping

### Phase 3: Documentation and Examples

1. Document scoped Pilot SPIFFE ID format
2. Add multi-tenancy deployment guide
3. Add example SPIRE registration scripts
4. Update security model documentation

### Phase 4: Operational Tooling

1. Add `spike pilot scope` command to show current scope
2. Add scope information to audit logs
3. Add metrics for per-scope operations

## Security Considerations

### Scope Escalation Prevention

- Scope is encoded in SPIFFE ID, signed by SPIRE CA
- Cannot be modified without new SVID issuance
- SPIRE registration controls who gets which scope

### Policy Pattern Validation

The `isPatternContainedInScope()` function is security-critical:

- Must reject patterns that could match outside the scope
- Conservative approach: reject ambiguous patterns
- Consider using a regex analysis library for correctness

Example dangerous patterns to reject:
```
".*"                    # Matches everything
"tenants/(pepsi|coca)"  # Alternation escapes scope
"tenants/pepsi.*"       # Missing anchor, matches "tenants/pepsi-evil"
```

### Cross-Tenant Information Leakage

- List operations must filter results server-side
- Error messages must not reveal existence of out-of-scope resources
- Timing attacks: consider constant-time responses for denied requests

## Open Questions

1. **Scope Hierarchy**: Should `tenants/pepsi` Pilot be able to create a
   sub-scoped Pilot for `tenants/pepsi/db`? (ideally: No)

2. **Cipher Scoping**: Should cipher operations be scoped? (possibly:
   No, encryption is not path-specific)

3. **Audit Log Access**: Should scoped Pilots see audit logs for their scope?
   (Unclear, Future consideration)

4. **Scope Wildcards**: Should scopes support patterns like `tenants/*`?
   (Ideally: No, keep scopes as literal prefixes---alternative can open a different kind of security can of worms)

Option	Isolation Level	Operational Overhead	Limitations
Separate SPIKE deployments	Strong	High (N deployments)	No central view
Path-based conventions	Weak	Low	Pilot sees everything
Trust domain per tenant	Strong	Very High	Complex SPIRE topology

Operation	Superuser Pilot	Scoped Pilot
Secret Get	All paths	Only paths starting with scope
Secret Put	All paths	Only paths starting with scope
Secret List	All paths	Only paths starting with scope
Secret Delete	All paths	Only paths starting with scope
Policy Create	All patterns	PathPattern must match scope prefix
Policy List	All policies	Only policies with matching PathPattern
Policy Get	All policies	Only policies with matching PathPattern
Policy Delete	All policies	Only policies with matching PathPattern
Cipher Encrypt	Yes	Yes (not path-scoped)
Cipher Decrypt	Yes	Yes (not path-scoped)
Recovery	Yes	No (superuser only)
Restore	Yes	No (superuser only)

EPIC: Scoped SPIKE Pilot Instances for Multi-Tenancy #281

Description

Context

Proposal

SPIFFE ID Structure

Scope Enforcement Points

Scope Matching Rules

Implementation Components

1. SPIFFE ID Parsing (spike-sdk-go/spiffeid/)

2. Scope Enforcement (app/nexus/internal/state/base/)

3. Policy Scope Validation (app/nexus/internal/route/acl/policy/)

4. List Filtering (app/nexus/internal/route/secret/, .../acl/policy/)

SPIRE Registration

Backward Compatibility

Rationale

Why Encode Scope in SPIFFE ID?

Why Not Use Policies for Pilot Scoping?

Why Keep Recovery/Restore Global?

Scope Granularity

Consequences

Positive

Negative

Neutral

Implementation Plan

Phase 1: Core Infrastructure

Phase 2: Policy Scoping

Phase 3: Documentation and Examples

Phase 4: Operational Tooling

Security Considerations

Scope Escalation Prevention

Policy Pattern Validation

Cross-Tenant Information Leakage

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions