This document provides solutions for common issues and debugging techniques.
Symptom: API endpoint takes 30+ seconds to respond
Cause: API is directly executing K8s operations instead of using reconciliation
Solution: API should only update database, return immediately
// ❌ BAD (blocking)
export async function POST(req: Request) {
await k8sService.createSandbox() // Blocks for 30s
return NextResponse.json({ success: true })
}
// ✅ GOOD (non-blocking)
export async function POST(req: Request) {
await prisma.sandbox.create({
data: { status: 'CREATING', /* ... */ }
})
// Reconciliation will handle K8s operations
return NextResponse.json({ success: true })
}Symptom: "User does not have KUBECONFIG configured"
Cause: Trying to use global K8s service instead of user-specific
Solution: Always load user's kubeconfig from UserConfig table
// ❌ BAD (old pattern)
const k8sService = new KubernetesService()
// ✅ GOOD (v0.4.0+)
const k8sService = await getK8sServiceForUser(userId)Symptom: Reconciliation job skips some records
Cause: Multiple instances or rapid cycles trying to process same records
Solution: This is expected behavior - optimistic locking ensures single-writer
// Repository layer automatically handles locking
const lockedSandboxes = await acquireAndLockSandboxes(10)
// Only returns sandboxes where lockedUntil IS NULL OR < NOW()
// Sets lockedUntil = NOW() + 30 seconds atomicallySymptom: Project shows PARTIAL status unexpectedly
Cause: Child resources in inconsistent states
Solution: Understand aggregation priority rules
Priority order:
- ERROR - At least one resource has ERROR
- CREATING - At least one resource has CREATING
- UPDATING - At least one resource has UPDATING
- Pure states - All same status → use that status
- Transition states:
- STARTING: All ∈ {RUNNING, STARTING}
- STOPPING: All ∈ {STOPPED, STOPPING}
- TERMINATING: All ∈ {TERMINATED, TERMINATING}
- PARTIAL - Inconsistent mixed states
Symptom: Terminal shows "Authentication failed"
Cause: Missing or incorrect TTYD_ACCESS_TOKEN
Solution: Check environment variable
# In sandbox pod
echo $TTYD_ACCESS_TOKEN
# Check URL format
# Should be: https://{domain}?authorization={base64(user:token)}Symptom: FileBrowser shows "Invalid credentials"
Cause: Missing or incorrect FILE_BROWSER_USERNAME/PASSWORD
Solution: Check environment variables
# In sandbox pod
echo $FILE_BROWSER_USERNAME
echo $FILE_BROWSER_PASSWORDSymptom: Sandbox can't connect to PostgreSQL
Cause: Database not ready or wrong connection string
Solution: Check database status and connection URL
# Check database status
kubectl get cluster -n {namespace}
# Check connection URL
echo $DATABASE_URL
# Test connection
psql $DATABASE_URL# Set kubeconfig
export KUBECONFIG=/path/to/kubeconfig
# Check StatefulSets
kubectl get statefulsets -n {namespace} | grep {project-name}
# Check pods
kubectl get pods -n {namespace} -l app={statefulset-name}
# Pod logs
kubectl logs -n {namespace} {pod-name}
# Pod logs (follow)
kubectl logs -f -n {namespace} {pod-name}
# Check KubeBlocks database cluster
kubectl get cluster -n {namespace} | grep {project-name}
# Get database credentials
kubectl get secret -n {namespace} {cluster-name}-conn-credential -o yaml
# Check ingresses
kubectl get ingress -n {namespace} | grep {project-name}
# Describe resource for events
kubectl describe statefulset -n {namespace} {statefulset-name}# Open Prisma Studio
npx prisma studio
# Direct PostgreSQL queries
psql $DATABASE_URL
# Check locked resources
psql $DATABASE_URL -c "SELECT id, status, \"lockedUntil\" FROM \"Sandbox\" WHERE \"lockedUntil\" IS NOT NULL;"# Main application logs
kubectl logs -n {namespace} {pod-name}
# Filter by module
kubectl logs -n {namespace} {pod-name} | grep "lib/events/sandbox"
# Filter by level
kubectl logs -n {namespace} {pod-name} | grep "ERROR"Cause: User hasn't uploaded kubeconfig
Solution:
- Check UserConfig table for KUBECONFIG key
- User needs to configure kubeconfig via UI or API
Cause: Project doesn't exist or user doesn't have access
Solution:
- Check project ID
- Check user ID matches project owner
- Check namespace matches user's kubeconfig
Cause: Project not in correct state for start
Solution:
- Check current project status
- Only STOPPED projects can be started
- Wait for current operation to complete
Cause: Project not in RUNNING state
Solution:
- Check project status
- Start project first
- Wait for RUNNING status
Cause: Various K8s errors
Solution:
- Check K8s events:
kubectl describe statefulset - Check resource quotas
- Check image availability
- Check namespace exists
Possible Causes:
- Database query performance
- Missing indexes
- N+1 query problem
Solutions:
// Use include for relations
await prisma.project.findMany({
include: { sandboxes: true, databases: true }
})
// Check query performance
## Incident References
- GitHub import delay postmortem:
- [project-import-delay-postmortem.md](/Users/che/Documents/GitHub/fulling/docs/project-import-delay-postmortem.md)
const prisma = new PrismaClient({
log: ['query', 'info', 'warn', 'error'],
})Possible Causes:
- Too many resources to process
- K8s API throttling
- Lock contention
Solutions:
- Reduce batch size in reconciliation job
- Increase reconciliation interval
- Check K8s API server load
Possible Causes:
- Memory leaks in event listeners
- Large objects in memory
- Unclosed connections
Solutions:
- Check for memory leaks
- Use streaming for large data
- Close connections properly
Possible Causes:
- Large runtime image
- Slow PVC provisioning
- Resource constraints
Solutions:
- Use smaller base image
- Check storage class performance
- Increase resource limits
- Architecture - Reconciliation pattern and event system
- Development Guide - Local development and code patterns
- Operations Manual - Deployment and K8s operations