-
Notifications
You must be signed in to change notification settings - Fork 0
Service Development
This guide documents best practices for creating NestJS services that depend on external resources like databases and storage systems. Following these patterns prevents race conditions and ensures reliable service initialization in Kubernetes environments.
- Database-Dependent Services
- Storage Services (S3/MinIO)
- Anti-Patterns to Avoid
- Testing Your Service
- Kubernetes Considerations
When your service depends on a database, you must handle the case where the database isn't ready when your service starts. This is common in Kubernetes where pods start in parallel.
Race condition timeline:
1. PostgreSQL pod starts, port 5432 becomes available
2. Init container detects port availability → SUCCESS
3. App pod starts
4. onModuleInit() runs immediately
5. PostgreSQL is still initializing (not accepting connections)
6. Database operations fail
7. App starts with broken database state
Init containers handle primary readiness. Application retry is only for the tiny timing window between init container exit and app startup:
/**
* Wait for database to be ready with short retry.
*
* Enterprise approach: Init containers handle primary readiness check.
* This short retry (3 attempts, ~3s) handles only the tiny timing window
* between init container exit and app startup. If still failing after
* this, we fail fast and let Kubernetes restart us with its own backoff.
*/
private async waitForDatabase(
maxRetries = 3,
delayMs = 1000,
): Promise<void> {
if (!this.pool) return;
let lastError: Error | undefined;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const client = await this.pool.connect();
await client.query('SELECT 1');
client.release();
if (attempt > 1) {
this.logger.info({ attempt }, 'Database connection established');
}
return;
} catch (error) {
lastError = error as Error;
if (attempt < maxRetries) {
this.logger.warn(
{ attempt, maxRetries, error: lastError.message },
'Database not ready, retrying...',
);
await new Promise((resolve) => setTimeout(resolve, delayMs));
}
}
}
// Fail fast - let Kubernetes handle restart with its own backoff
throw new Error(
`Database not available after ${maxRetries} attempts: ${lastError?.message}`,
);
}💡 Why short retry + fail fast:
- Init containers are the primary defense (infrastructure layer)
- App retry handles only edge cases (~1% of situations)
- Long retries mask problems and delay failure detection
- Kubernetes restart backoff is designed for this - don't reinvent it
In onModuleInit(), follow this exact order:
async onModuleInit() {
if (!this.enabled) {
this.logger.info('DATABASE_URL not set, database features disabled');
return;
}
try {
// Step 1: Create connection pool (fast, no network I/O)
this.pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 5,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
});
// Step 2: Initialize ORM (fast, no network I/O)
this._db = drizzle(this.pool, { schema });
// Step 3: Wait for database readiness (with retry)
await this.waitForDatabase();
// Step 4: Run migrations (only after confirmed ready)
await this.runMigrations();
// Step 5: Seed data (only after migrations complete)
await this.runSeeding();
// Step 6: Set connected flag (only after all steps pass)
this.isConnected = true;
this.logger.info('Database initialization complete');
} catch (error) {
this.logger.error({ error }, 'Failed to initialize database');
this.isConnected = false;
// Re-throw to prevent app from starting with broken database
throw error;
}
}Database connections should be singleton across the app:
import { Global, Module } from '@nestjs/common';
import { DatabaseService } from './database.service';
@Global()
@Module({
providers: [DatabaseService],
exports: [DatabaseService],
})
export class DatabaseModule {}This ensures:
- ✅ Only one connection pool is created
- ✅ All services share the same pool
- ✅ Initialization happens once
If initialization fails, let the app crash. Kubernetes will restart it. A running app with broken database is worse than a crash loop.
// GOOD: Re-throw critical errors
try {
await this.waitForDatabase();
await this.runMigrations();
} catch (error) {
this.logger.error({ error }, 'Failed to initialize database');
throw error; // App will crash and restart
}
// BAD: Swallowing errors
try {
await this.waitForDatabase();
} catch (error) {
this.logger.error({ error }, 'Database init failed');
// App continues with broken state!
}📝 Exception: Optional features like seeding can fail gracefully:
private async runSeeding(): Promise<void> {
try {
await seedDatabase(this.pool);
} catch (error) {
this.logger.error({ error }, 'Seeding failed');
// Don't throw - seeding failure shouldn't prevent app startup
}
}S3-compatible storage has different failure modes than databases:
- Temporary unavailability during uploads
- Network timeouts on large files
- Bucket not existing on first access
- Presigned URL expiration
async uploadWithRetry(
bucket: string,
key: string,
data: Buffer,
maxRetries = 3,
): Promise<void> {
let lastError: Error | undefined;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
await this.s3Client.send(
new PutObjectCommand({ Bucket: bucket, Key: key, Body: data }),
);
return;
} catch (error) {
lastError = error as Error;
if (attempt === maxRetries) throw lastError;
const delay = Math.min(1000 * Math.pow(2, attempt - 1), 10000);
this.logger.warn(
{ attempt, maxRetries, delayMs: delay, error: lastError.message },
'Upload failed, retrying...',
);
await new Promise((r) => setTimeout(r, delay));
}
}
}async onModuleInit() {
await this.waitForStorage();
await this.ensureBucketExists();
}
private async ensureBucketExists(): Promise<void> {
try {
await this.s3Client.send(
new HeadBucketCommand({ Bucket: this.bucket }),
);
this.logger.info({ bucket: this.bucket }, 'Bucket exists');
} catch (error) {
if ((error as Error).name === 'NotFound') {
await this.s3Client.send(
new CreateBucketCommand({ Bucket: this.bucket }),
);
this.logger.info({ bucket: this.bucket }, 'Bucket created');
} else {
throw error;
}
}
}async getPresignedUrl(key: string, expiresIn = 3600): Promise<string> {
// Always verify object exists before generating URL
try {
await this.s3Client.send(
new HeadObjectCommand({ Bucket: this.bucket, Key: key }),
);
} catch (error) {
if ((error as Error).name === 'NotFound') {
throw new NotFoundException(`Object not found: ${key}`);
}
throw error;
}
return getSignedUrl(
this.s3Client,
new GetObjectCommand({ Bucket: this.bucket, Key: key }),
{ expiresIn },
);
}| Anti-Pattern | Why It's Bad | Correct Approach |
|---|---|---|
| Assume port availability = service readiness | Port can be open while service initializes | Use native client health checks |
| Run migrations before testing connection | Migrations fail with confusing errors | Test connection first with retry |
Set isConnected = true before verifying |
Other code assumes database is ready | Set flag only after all init steps |
| Use boolean flags without retry logic | Single failure = permanent broken state | Retry with exponential backoff |
| Assume bucket exists | First request fails | Check/create bucket on startup |
| Upload without retry logic | Transient failures break uploads | Retry with backoff |
| Use infinite presigned URL expiration | Security risk | Use reasonable expiration (1-24h) |
| Ignore multipart upload cleanup | Failed uploads waste storage | Implement lifecycle policies |
1. Start your service WITHOUT the database running:
# Don't start PostgreSQL
npm run start:dev2. Verify it retries and fails fast:
[Nest] Database not ready, retrying... (attempt 1/3)
[Nest] Database not ready, retrying... (attempt 2/3)
[Nest] Database not available after 3 attempts
💡 With enterprise approach (3 attempts, ~3s), failures are detected quickly and Kubernetes restarts handle recovery with proper backoff.
3. Test successful connection after retry:
Start the database during retry window to verify recovery:
docker start postgres4. Verify it connects successfully:
[Nest] Database not ready, retrying... (attempt 1/3)
[Nest] Database connection established (attempt 2)
[Nest] Migrations completed successfully
[Nest] Database initialization complete
describe('DatabaseService', () => {
it('should retry connection on failure', async () => {
// Mock pool.connect to fail first 2 times, then succeed
const connectMock = vi.fn()
.mockRejectedValueOnce(new Error('Connection refused'))
.mockRejectedValueOnce(new Error('Connection refused'))
.mockResolvedValueOnce({
query: vi.fn().mockResolvedValue({}),
release: vi.fn(),
});
// Verify service eventually connects
await service.onModuleInit();
expect(connectMock).toHaveBeenCalledTimes(3);
expect(service.isConnected).toBe(true);
});
});Init containers provide first-line defense but are not sufficient alone:
| Layer | What It Checks | Limitation |
|---|---|---|
| Init container | Network reachability | Port open != service ready |
| Application retry | Actual query execution | Handles init container gap |
Always implement both layers:
-
Init container: Uses native client tools (
pg_isready,redis-cli ping) -
Application: Retry with exponential backoff in
onModuleInit()
Use native client tools instead of netcat port checks:
# PostgreSQL - uses pg_isready
- name: wait-for-postgresql
image: postgres:16-alpine
command: ["sh", "-c", "until pg_isready -h $HOST -p 5432 -U app; do sleep 2; done"]
# MongoDB - uses mongosh ping
- name: wait-for-mongodb
image: mongo:7-jammy
command: ["sh", "-c", "until mongosh --host $HOST --eval 'db.ping()'; do sleep 2; done"]
# Redis - uses redis-cli ping
- name: wait-for-redis
image: redis:7-alpine
command: ["sh", "-c", "until redis-cli -h $HOST ping | grep PONG; do sleep 2; done"]
# MinIO - uses health endpoint
- name: wait-for-minio
image: curlimages/curl:8.5.0
command: ["sh", "-c", "until curl -sf http://$HOST:9000/minio/health/live; do sleep 2; done"]Different init containers have different memory requirements based on their base images:
| Init Container | Image | Memory Limit | Notes |
|---|---|---|---|
| wait-for-postgresql | postgres:16-alpine | 64Mi | Lightweight |
| wait-for-mongodb | mongo:7-jammy | 256Mi |
mongosh requires more memory |
| wait-for-redis | redis:7-alpine | 64Mi | Lightweight |
| wait-for-minio | curlimages/curl | 32Mi | Minimal |
⚠️ Important: The MongoDB init container will OOMKill at 128Mi. Always allocate at least 256Mi formongosh-based readiness checks.
The MongoDB Community Operator requires specific RBAC permissions for its agent to function properly. The agent runs as a sidecar container and needs to:
- Read secrets - Verify automation config
- Read/patch pods - Update agent version annotations
Without these permissions, the agent's readiness probe fails with:
Warning Unhealthy pod/app-mongodb-0 Readiness probe failed: Error verifying agent is ready
The k8s-ee platform automatically creates these RBAC resources via the MongoDB chart:
# charts/mongodb/templates/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: mongodb-database
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: mongodb-database
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: mongodb-database
subjects:
- kind: ServiceAccount
name: mongodb-database📝 Note: The ARC runner ClusterRole includes
rolesandrolebindingspermissions to deploy these RBAC resources.
Even with proper init containers, you still need application-level retry because:
- 🕐 Timing gaps: Small window between init container success and app startup
- 🔗 Connection pool exhaustion: Database might reject new connections temporarily
- ☁️ Network transients: Brief network issues during startup
- 📄 Graceful degradation: App can log meaningful errors while retrying
| # | Best Practice |
|---|---|
| 1 | Always implement retry logic with exponential backoff |
| 2 | Order initialization correctly: pool -> wait -> migrate -> seed -> flag |
| 3 | Use Global modules for shared resources |
| 4 | Let critical failures crash the app (Kubernetes will restart) |
| 5 | Use native client tools in init containers |
| 6 | Test failure scenarios locally before deploying |
- Database Setup - Enable and configure databases
- Database Migrations - Schema versioning with Drizzle ORM
- Database Seeding - Populate databases with test data
- Security and Access Control - Security architecture and network isolation
- Troubleshooting - Common issues and solutions
Getting Started
User Guides
- Database Setup
- Database Migrations
- Database Seeding
- Service Development
- Security and Access Control
- Alternative CI Integration
Troubleshooting
Operations
- VPS Access
- K3s Operations
- ARC Runner Management
- Cleanup Job
- Preserve Environment
- Grafana Dashboards
- Cluster Recovery
Architecture
Demo Applications
Development