This document provides a comprehensive architecture overview of the AppPlatform backend — a cloud-native, event-driven Node.js/TypeScript platform built on Express.js and PostgreSQL. It serves as the central hub for health data ingestion, multi-device synchronization, asynchronous projection pipelines, AI-powered analytics, and real-time communication.
The backend is engineered for the real world, not the happy path. It handles unreliable mobile networks, extended offline periods, bursty health data ingestion, and concurrent multi-device modifications. The architecture prioritizes data integrity, idempotency, and explicit consistency tracking across all operating conditions.
Scope: This document covers the backend repository's internal architecture, external interfaces, and deployment topology. Client-side and firmware architectures are excluded unless they directly impact backend design.
A clear separation into API, Services, Repositories, and Persistence layers. Each layer has a single responsibility and communicates only with adjacent layers through well-defined interfaces.
Goal: Modularity, independent testability, and predictable change propagation.
bootstrap.ts is the sole composition root. Every service receives its dependencies through constructor injection. No service locators, no ambient singletons, no hidden coupling. The entire dependency graph is visible in one file.
Goal: Maximum testability, minimal coupling, and complete transparency of system wiring.
Domain events (DomainEventService) drive analytics, achievements, projections, and notifications via decoupled subscribers. The OutboxService writes events to a dedicated table within the same database transaction as the primary data change, guaranteeing at-least-once delivery.
Goal: No data change silently drops its downstream side effects.
Write operations (raw HealthSample ingestion via HealthSampleService) are separated from read operations (pre-computed projections via HealthProjectionReadService). The write path is high-throughput and append-only. The read path serves fast, pre-computed results with explicit staleness via watermarks.
Goal: API reads are fast, writes are durable, and staleness is always explicit.
Request-level idempotency uses requestId + payloadHash in a persistent tracking table. Sample-level deduplication relies on composite unique constraints. Sync operations use clientSyncOperationId with cached results. Retries are always safe.
Goal: Network unreliability, client bugs, and background job retries never produce duplicate data.
Immediate error detection via explicit checks for invalid states, malformed data, and missing configurations. Errors are wrapped in AppError and logged with rich context via LoggerService. No silent failures, no swallowed exceptions.
Goal: Every failure is observable, traceable, and actionable.
Critical definitions — health metric types, sync entity types, conflict resolution policies, API contracts — are centralized in packages/shared and consumed by both backend and mobile. No ad-hoc logic duplication.
Goal: Prevent semantic drift and ensure consistent behavior across the entire stack.
| Decision | Rationale |
|---|---|
| PostgreSQL as canonical data store | All data (including time-series via JSONB) consolidated into one store, simplifying persistence |
| Separate Web and Worker processes | Independent scaling, resource isolation, distinct entry points (src/index.ts / src/worker.ts) |
| Three-lane health ingestion | HOT (recent), COLD (backfill), CHANGE (deletions) — each with distinct QoS and freshness guarantees |
| Dedicated Redis for BullMQ | noeviction policy for job durability, isolated from API cache's volatile-ttl policy |
| Cursor-based sync | O(log n) keyset pagination, stable under concurrent writes, enabling incremental pulls |
| Config-driven conflict resolution | Per-entity, per-field merge policies declared in shared contracts (conflict-configs.ts) — no ad-hoc merge logic |
The AppPlatform backend operates within a broader ecosystem of clients, cloud infrastructure, and third-party services.
| Category | Systems | Integration |
|---|---|---|
| Mobile Clients | iOS, Android | HTTP API, WebSocket, HealthKit/Health Connect data push |
| Web Clients | Browser | HTTP API, WebSocket |
| AWS Infrastructure | Cognito, Secrets Manager, S3, CloudWatch, RDS/Neon, ElastiCache | Auth, secrets, storage, logging, persistence, caching |
| Third-Party APIs | Google OAuth, Anthropic AI | Federated auth, LLM analysis and recommendations |
| Health SDKs | Apple HealthKit, Google Health Connect | Client-side collection, then batch upload — no direct backend integration |
The backend deploys as two distinct process types:
- Web Service (
src/index.ts) — Handles synchronous HTTP API requests and WebSocket connections. Stateless, horizontally scalable behind a load balancer. - Worker Service (
src/worker.ts) — Processes asynchronous BullMQ jobs (projections, analytics, reconciliation). Independent scaling and resource isolation from the web service.
Multiple stateless instances run behind a load balancer. socket.service.ts uses @socket.io/redis-adapter for horizontal WebSocket scaling — events emitted from any instance reach all connected clients via Redis Pub/Sub.
Multiple instances consume jobs from shared BullMQ queues via job-manager.service.ts. Redis ensures job durability across worker crashes. Per-queue concurrency controls resource utilization.
| Concern | Strategy |
|---|---|
| API & WebSockets | Stateless web instances + Redis adapter for Socket.IO |
| Background Jobs | Multiple worker instances + BullMQ load distribution |
| Resource Isolation | Web and worker processes are separate — heavy projections never block API latency |
| Store | Role | Configuration |
|---|---|---|
| PostgreSQL | Canonical data store | Managed (RDS/Neon), strong consistency, JSONB support. database.service.ts manages connections with serverless retry logic |
| Redis (API Cache) | Response caching, rate limiting, distributed locks | volatile-ttl eviction policy via cache.service.ts |
| Redis (BullMQ/WS) | Job queues, Socket.IO adapter | noeviction policy — strict separation prevents eviction conflicts |
| AWS S3 | Blob storage | Journal photos, analytics reports, database backups via s3.service.ts |
Key insight: The architecture strictly separates Redis instances for API caching (
volatile-ttl) and durable job queues (noeviction) to prevent eviction policy conflicts and ensure data integrity for background jobs.
The backend follows a four-layer architecture with cross-cutting concerns spanning multiple layers.
graph TD
Client["Client (Mobile/Web)"]
subgraph BM["Backend Monolith"]
A[1. API Layer] --> B[2. Services Layer]
B --> C[3. Repositories Layer]
C --> D[4. Persistence Layer]
end
Client -- HTTP/WS --> A
subgraph CC["Cross-Cutting Concerns"]
Security(Security)
Observability(Observability)
Config(Configuration)
Async(Async Processing)
Realtime(Realtime Comm.)
end
A -- "Cross-Cutting" --> Security
A -- "Cross-Cutting" --> Observability
B -- "Cross-Cutting" --> Async
B -- "Cross-Cutting" --> Realtime
C -- "Cross-Cutting" --> Observability
D -- "Data Storage" --> PostgreSQL[(PostgreSQL)]
D -- "Cache/Queues" --> Redis[(Redis)]
D -- "Blob Storage" --> S3[(S3)]
Located in code-snippets/api/v1/, this layer handles incoming requests, applies cross-cutting concerns, and routes to business logic.
| Component | Location | Responsibility |
|---|---|---|
| Controllers | code-snippets/api/v1/controllers/ |
Thin request handlers — extract params, call services, format responses, delegate errors |
| Middleware | code-snippets/api/v1/middleware/ |
Cross-cutting concerns via MiddlewareFactory for consistent DI |
| Routes | code-snippets/api/v1/routes/ |
Endpoint definitions with ordered middleware chains, registered via RouteRegistry in code-snippets/core/route-registry.ts |
| Schemas | code-snippets/api/v1/schemas/ + packages/shared/src/contracts/ |
Zod schemas for request/response validation — canonical SSOT for API contracts |
Middleware Stack
| Middleware | Purpose |
|---|---|
correlationContext |
Generates and propagates X-Correlation-ID for end-to-end tracing |
serverTime |
Injects Server-Time header for client clock offset calculation |
apiGateway |
Circuit breakers and request/response transformations for external service calls |
logging |
Request-specific logging contexts and request/response details |
securityLogging |
Security context initialization and suspicious pattern detection |
jsonBodyParser |
JSON parsing with inflated size limits enforcement |
requestValidation |
Zod schema validation for params, query, and body |
auth |
JWT validation via Cognito, populates req.user with authenticated context |
userContext |
Extracts and validates user context from req.user |
rateLimitQueue |
API rate limiting with request queuing |
rate-limit-monitoring |
Rate limit metrics collection and response headers |
apiCache |
Redis-based response caching with ETags, conditional requests, and race condition prevention |
sessionSecurity |
Device fingerprinting and X-Session-ID validation for session integrity |
health-error |
Contract-compliant error formatting for health batch endpoints |
Located in code-snippets/services/, this layer encapsulates all core business logic. Services coordinate repositories, other services, and external integrations.
Service Catalog
Core Business Services
| Service | Domain |
|---|---|
user.service.ts |
User management |
consumption.service.ts |
Consumption tracking |
session.service.ts |
Session management |
purchase.service.ts |
Purchase management |
journal.service.ts |
Journaling |
product.service.ts |
Product catalog |
inventory.service.ts |
Inventory management |
goals.service.ts / achievements.service.ts |
Goals and achievements |
safety.service.ts |
Safety and anomaly detection |
analytics.service.ts |
Analytics and reporting |
Health Data Pipeline
| Service | Responsibility |
|---|---|
healthSample.service.ts |
API ingestion with dual-layer idempotency, triggers outbox events |
healthIngestQueue.service.ts |
Large batch queuing for async worker processing |
health-aggregation.service.ts |
ValueKind-aware aggregation logic for raw samples |
health-projection-coordinator.service.ts |
Fan-out of health.samples.changed events to projection handlers |
health-projection-read.service.ts |
Queries derived read models with watermark freshness overrides |
health-insight-engine.service.ts |
Deterministic, evidence-backed insights at read time via pure rules |
AI Integration
| Service | Responsibility |
|---|---|
ai.service.ts |
Core orchestration for external AI providers (Anthropic) |
ai-context-aggregation.service.ts |
Rich AI prompt construction from aggregated user data |
ai-phi-redaction.service.ts |
PHI redaction before AI model submission |
ai-product-recommendation.service.ts |
Personalized product recommendations |
ai-journal-analysis.service.ts |
Journal entry analysis |
ai-weekly-report.service.ts |
Weekly health reports |
aiCostTracking.service.ts |
AI API usage and cost monitoring |
User Profiling & ML
| Service | Responsibility |
|---|---|
user-consumption-profile.service.ts |
EMA learning from purchase cycles, consumption prediction |
temporal-pattern.service.ts |
Dirichlet-multinomial temporal histogram engine for routine detection |
user-routine.service.ts |
User routine profile management |
inventory-prediction.service.ts |
Multi-factor depletion prediction and purchase recommendations |
Cross-System Concerns
| Service | Responsibility |
|---|---|
auth.service.ts / cognito.service.ts |
Authentication orchestration, Cognito integration |
cache.service.ts |
Redis-based caching and distributed advisory locks |
logger.service.ts |
Structured, context-rich logging |
securityLogger.service.ts |
Security event recording for audit and alerting |
performanceMonitoring.service.ts |
Application performance metrics collection |
outbox.service.ts / outbox-processor.service.ts |
Transactional outbox pattern and async event processing |
sync.service.ts / syncLease.service.ts |
Bidirectional sync, conflict resolution, admission control |
deviceTelemetry.service.ts |
Device-specific sensor data collection and retrieval |
sessionTelemetryQueue.service.ts |
Session telemetry computation job queuing |
job-manager.service.ts |
BullMQ job queue orchestration |
secrets.service.ts |
Secure access to application secrets (AWS Secrets Manager) |
correlationTracker.service.ts |
Distributed tracing correlation IDs |
requestValidation.service.ts |
Utility functions for schema-based request validation |
httpsValidation.service.ts |
HTTPS-related validation utilities |
Located in code-snippets/repositories/, this layer abstracts all direct database access via Prisma ORM.
RepositoryFactory(repository.factory.ts) — Central mechanism for creating all repositories with consistentPrismaClient,LoggerService, andPerformanceMonitoringServiceinjection.- Per-entity repositories (e.g.,
user.repository.ts,consumption.repository.ts,health-sample.repository.ts,sync-operation.repository.ts) — Hide ORM details, enforceuserId-scoped authorization to prevent IDOR, implement optimistic locking viaversionfields, and encapsulate complex query logic.
| Package | Purpose |
|---|---|
code-snippets/core/ |
DI framework utilities — controller-registry.ts, middleware-factory.ts, route-registry.ts, common type definitions |
code-snippets/utils/ |
Helpers for authentication (auth.utils.ts), error handling (error-handler.ts), crypto, and device ID generation |
packages/shared/ |
Cross-application type safety — canonical entity types (entity-types.ts), Zod contracts (health.contract.ts, sync-lease.contract.ts), health config (metric types, units, normalization, precision, payload hashing), sync config (relation graph, cursor formats, conflict strategies) |
bootstrap.ts is the single composition root — the sole, authoritative place responsible for instantiating every service, wiring all dependencies via constructor injection, and orchestrating initialization and shutdown in the correct order.
Initialization Stages
| Phase | Components | Purpose |
|---|---|---|
| 0 | OpenTelemetry SDK (instrumentation.ts) |
Tracing and metrics collection before any app code loads — instruments Node.js, Express, Redis, PostgreSQL |
| 1 | LoggerService, S3Service |
Foundational — LoggerService is a dependency for almost all other components |
| 2 | ConfigSecurityService, AuthConfig |
Load and validate all configuration, including secrets from AWS Secrets Manager. Immutable Config object frozen after loading |
| 3 | DatabaseService, CacheService, PerformanceMonitoringService, ProjectionCheckpointRepository, HealthProjectionCoordinatorService, HealthAggregationService, HealthProjectionReadService, HealthInsightEngineService |
Core infrastructure — database connections, caching, metrics, projection coordination |
| 4 | RepositoryFactory + all entity repositories |
Data access layer — PrismaClient from DatabaseService injected into all repositories |
| 5 | DomainEventService, OutboxService, CognitoService, SocketService, SecurityLoggerService, AuthRateLimitService, SessionSecurityService, SyncLeaseService |
Core platform services bridging infrastructure and business logic |
| 6 | ConsumptionService, JournalService, HealthSampleService, etc. |
Domain-specific business services |
| 7 | UserConsumptionProfileService, TemporalPatternService, InventoryPredictionService |
User profiling and ML services (depend on Phase 6 business services) |
| 8 | ControllerRegistry, MiddlewareFactory, RouteRegistry, Express App |
API layer assembly and route registration |
- Constructor injection — Pure classes receiving all dependencies through constructors. Enforces explicit contracts and simplifies unit testing.
initialize()methods — Async setup (database connections, event handler registration, configuration loading) that cannot run in synchronous constructors.shutdown()methods — Graceful cleanup of external resources (database connections, Redis clients, timers) orchestrated in reverse dependency order bybootstrap.ts.
Incoming requests pass through an ordered middleware chain before reaching controller logic. The chain handles correlation tracking, security, authentication, rate limiting, caching, and validation.
HTTP Request Sequence Diagram
sequenceDiagram
actor Client
participant LoadBalancer as Load Balancer
participant ExpressApp as Express App
participant Middleware as Middleware Chain
participant Controller
participant Service
participant Repository
participant Database as PostgreSQL
participant Cache as Redis Cache
participant Logger as Logger Service
participant PerfMon as Performance Monitoring
Client->>+LoadBalancer: HTTP Request (GET /data)
LoadBalancer->>+ExpressApp: Forward Request
ExpressApp->>+Middleware: correlationContext
Middleware->>Middleware: serverTime
Middleware->>Middleware: apiGateway
Middleware->>Middleware: requestLogging
Middleware->>Middleware: securityContext
Middleware->>Middleware: suspiciousPatterns
Middleware->>Middleware: jsonBodyParser
Middleware->>Middleware: requestValidation
Middleware->>Middleware: auth.middleware (JWT Validation)
Middleware->>Middleware: userContext (User Info)
Middleware->>Middleware: rateLimitQueue (Check/Queue)
Middleware->>Middleware: rate-limit-monitoring (Metrics/Headers)
Middleware->>+Cache: apiCache (Lookup CacheKey)
Cache-->>-Middleware: Cache Hit/Miss
alt Cache Hit
Middleware->>Client: 200/304 Cached Response
Middleware->>PerfMon: Record CACHE_HIT
else Cache Miss
Middleware->>+Controller: Invoke Controller Method
Controller->>+Service: Call Business Logic
Service->>+Repository: Request Data
Repository->>+Database: Query Data
Database-->>-Repository: Query Result
Repository-->>-Service: Data
Service-->>-Controller: Processed Data
Controller-->>+ExpressApp: Send Response
ExpressApp->>Middleware: Intercept Response
Middleware->>Cache: apiCache (Store Response)
Middleware->>PerfMon: Record CACHE_MISS, API_RESPONSE_TIME
Middleware->>Logger: Log API_RESPONSE
Middleware-->>-LoadBalancer: Send Response (HTTP 200)
LoadBalancer->>-Client: Final Response
end
Key insight: On cache hit, responses are served without touching PostgreSQL. On cache miss, the full layer stack executes and the response is cached for subsequent requests.
The ingestion pipeline moves health data from mobile devices through the backend's trust boundary into PostgreSQL, with transactional outbox events for downstream projection processing.
Client-Side Ingestion (packages/app):
| Component | Responsibility |
|---|---|
HealthKitContext.tsx |
HealthKit permissions and raw data access |
HealthKitService.ts |
Abstracts HealthKit operations (availability, permissions, data reads) |
HealthKitAdapter.ts |
Implements HealthDataProviderAdapter for iOS HealthKit — metric configurations and category code mapping |
DirtyKeyComputer.ts |
Computes dirty projection keys from ingested samples |
HealthIngestionEngine.ts |
Three-lane pipeline (HOT for recent UI data, COLD for historical backfill, CHANGE for deletions/edits) — enforces cursor updates, validation, and idempotent writes |
HealthSyncCoordinationState.ts |
Manages ingestion intervals, backoff, and single-operation enforcement |
HealthUploadEngine.ts |
Batch staging with staged_batch_id for atomic claims and crash recovery, serialization, payload size limits |
HealthUploadHttpClientImpl.ts |
HTTP client wrapping BackendAPIClient for auth and retry |
Backend Ingestion (packages/backend):
| Component | Responsibility |
|---|---|
health.controller.ts |
Receives POST /health/samples/batch-upsert, delegates to healthIngestQueue.service.ts for async and healthSample.service.ts for idempotency |
healthIngestQueue.service.ts |
Pre-processing queue — enqueues large batches into BullMQ to prevent HTTP timeouts |
HealthSampleService.ts |
Core ingestion logic (see details below) |
HealthSampleRepository |
Persists HealthSample entities to PostgreSQL |
HealthIngestRequest table |
Tracks request-level idempotency — ensures same batch isn't processed twice |
HealthSampleService internals:
- Two-layer idempotency:
requestId+payloadHashfor request-level (HealthIngestRequesttable), and(userId, sourceId, sourceRecordId, startAt)unique constraint for sample-level deduplication - Privacy gating: Enforces
allowHealthDataUploadandblockedMetricsfrom user privacy settings (viaUserRepository) - Unit normalization: Ensures all values and units conform to canonical forms (
unit-normalization.ts,metric-types.ts) - Transactional outbox: Triggers
health.samples.changedevents viaOutboxServicewithin the same transaction asHealthSamplepersistence
Guarantee: Zero dual-writes. Raw samples and outbox events are committed in a single atomic transaction.
For detailed ingestion logic, three-lane contract, and backend processing, see
HEALTH-INGESTION-PIPELINE.MD.
After ingestion, raw health data changes are asynchronously transformed into pre-computed read models with watermark-based freshness tracking. This implements the read side of the CQRS pattern.
Projection Handlers:
| Handler | Output Table | Logic |
|---|---|---|
HealthRollupProjectionHandler |
user_health_rollup_day |
Daily numeric aggregates (avg heart rate, total steps) via HealthAggregationService for ValueKind-aware logic |
SleepSummaryProjectionHandler |
user_sleep_night_summary |
Nightly sleep summaries — clustering, nap detection, canonical source selection |
SessionImpactProjectionHandler |
user_session_impact_summary |
Before/during/after health metric deltas around consumption sessions |
ProductImpactProjectionHandler |
user_product_impact_rollup |
Per-product impact aggregation (depends on session-impact completing) |
TelemetryCacheProjectionHandler |
session_telemetry_cache |
Marks entries STALE when underlying samples change — triggers lazy recomputation on read |
Coordination Mechanisms:
- Checkpoint tracking —
ProjectionCheckpointRepositorytracks per-projection status (PENDING→PROCESSING→COMPLETED/FAILED), enabling idempotent retries and independent failure isolation - Lease-based concurrency — Prevents multiple workers from processing the same projection for the same event
- Watermark freshness —
HealthProjectionReadServiceoverrides a row's status toSTALEif itssourceWatermarkis behind the currentUserHealthWatermark - Event coalescing —
outbox-coalescing.tsgroups related events for processing efficiency - Read-time insights —
health-insight-engine.service.tsgenerates deterministic insights from projection data using pure rule-based logic (health-insight-rules.ts)
Guarantee: Individual projection failures don't block other projections. Watermarks provide explicit freshness tracking — no silent staleness.
For detailed projection coordination, checkpointing, and watermark logic, see
PROJECTION-PIPELINE.MD.
The sync pipeline enables bidirectional data synchronization between multiple offline-first mobile clients and the backend, with config-driven conflict resolution and structural integrity enforcement.
Key Mechanisms:
| Mechanism | Implementation |
|---|---|
| Offline-first | Clients create and modify data offline using client-generated UUIDs (clientConsumptionId, clientEntryId) |
| Cursor-based pull | CompositeCursor from packages/shared/src/sync-config/cursor.ts, topologically sorted by ENTITY_SYNC_ORDER (parents before children) to prevent FK violations |
| Push idempotency | syncOperationId + cached resultPayload in SyncOperation table — repeated pushes return cached results without reprocessing |
| Optimistic locking | version fields on syncable entities detect conflicts when clientVersion < serverVersion |
| Conflict resolution | Config-driven via ENTITY_CONFLICT_CONFIG from conflict-configs.ts — strategies: LAST_WRITE_WINS, CLIENT_WINS, SERVER_WINS, MERGE, MANUAL |
| Field-level policies | For MERGE strategy: LOCAL_WINS (notes), MERGE_ARRAYS (tags), MAX_VALUE (counters), MONOTONIC (status). Pure merge via entity-merger.ts |
| Entity handlers | code-snippets/services/sync/handlers/ — entity-specific create(), update(), delete(), fetchServerVersion(), merge() with Zod validation and transactional writes |
| Change tracking | Every mutation writes to SyncChange table (source for pull sync). SyncState tracks lastSyncCursor per (userId, deviceId) |
| Distributed locks | SyncLeaseService (Redis) for admission control + SyncStateRepository (PostgreSQL) for per-device serialization via lockOwner/lockAcquiredAt |
Guarantee: Cursors never advance past corrupted state. Conflict resolution is deterministic, auditable, and testable in isolation.
For detailed sync mechanics, entity handlers, and conflict algorithms, see
SYNC-ENGINE.MD.
A dedicated worker process handles all background tasks, isolating compute-heavy operations from the web service's request path.
Key Components:
| Component | Responsibility |
|---|---|
JobManagerService |
Manages BullMQ Queue and Worker instances. Dedicated Redis with noeviction. Provides enqueueJob() for producers, getQueueDepth() for backpressure. Per-queue config (maxQueueLen, concurrency, removeOnComplete/Fail) |
JobProcessor |
Core dispatch logic in worker processes — delegates to specific services based on JobNames enum. Fault-isolated via try/catch with structured error logging |
schedules.ts |
Recurring cron jobs via BullMQ repeatable jobs — analytics refresh, health ingest reaper, inventory reconciliation, stale session reconciliation |
Job Types:
| Job | Purpose |
|---|---|
HEALTH_INGEST_BATCH |
Async processing of large health uploads |
SESSION_TELEMETRY_COMPUTE |
Derived health data for session visualizations |
REFRESH_ANALYTICS_MVS |
PostgreSQL materialized view refresh for analytics |
HEALTH_INGEST_REAPER |
Stale ingestion request cleanup |
HEALTH_SAMPLE_SOFT_DELETE_PURGER |
Hard-delete old soft-deleted samples |
INVENTORY_RECONCILIATION |
Link unlinked consumptions to inventory items |
STALE_SESSION_RECONCILIATION |
Complete sessions whose end time has passed |
Guarantee: Jobs survive worker crashes via durable Redis queues. All jobs are idempotent and retry-safe.
For BullMQ configuration, concurrency control, and failure modes, see
WORKER-SCALABILITY.MD.
The backend supports real-time updates via Socket.IO WebSockets for live session events, sync notifications, and instant UI feedback.
Connection Flow:
- Mobile/Web clients establish WebSocket connections to the backend's web service instances
- JWT authenticated via
CognitoService.validateToken(), Cognito sub mapped to internaluserIdfor consistency with HTTP API - Authenticated sockets join rooms:
user:{userId},device:{deviceId},session:{sessionId} - Domain events (e.g.,
consumption.created,session.ended) emitted byDomainEventService WebSocketEventSubscriberconverts events to minimal, type-safeRealtimeEnvelopeV1(frompackages/shared/src/realtime/contracts/events.ts) — includesuserId,entityId,eventType, and cache-patch payloadWebSocketBroadcasteremits to appropriate rooms viasocket.service.ts- Clients patch local React Query caches directly from the payload — no full API refetch
Reliability:
| Concern | Mechanism |
|---|---|
| Horizontal scaling | @socket.io/redis-adapter on the dedicated BullMQ/WS Redis instance broadcasts events across all web instances |
| Authentication | JWT validation via CognitoService on every connection |
| Tenancy enforcement | WebSocketBroadcaster validates userId match — no cross-user data leakage |
| Deduplication | dedupeCache keyed by (eventType, entity, entityId) prevents duplicate emissions during failover/retry |
The canonical, primary, and single source of truth for all application data. Managed via Prisma ORM with schema defined in prisma/schema.prisma.
Schema Categories
| Category | Tables |
|---|---|
| Core Business | User, Consumption, ConsumptionSession, JournalEntry, Purchase, Product, InventoryItem, Goal, Achievement, UserAchievement, Device, SafetyRecord |
| Sync Metadata | SyncOperation, SyncConflict, SyncChange, SyncState |
| Event Outbox | OutboxEvent (central to the Transactional Outbox Pattern) |
| Health Data | HealthSample (raw samples — time-series, likely optimized with TimescaleDB), HealthIngestRequest (idempotency tracking) |
| Health Projections | UserHealthRollupDay, UserSleepNightSummary, UserSessionImpactSummary, UserProductImpactRollup (optimized read models) |
| AI | AiUsageRecord, AiChatThread, AiChatMessage, AiAnalysis, AiRecommendationSet, AiRecommendationItem, AiResponseCache |
| Telemetry | SessionTelemetryCache (precomputed session health data), UserHealthWatermark, ProjectionCheckpoint |
| Real-time | WebSocketEvent, LiveConsumption, SessionMessage (raw events for audit/reconciliation) |
Key PostgreSQL Features:
| Feature | Usage |
|---|---|
| JSONB columns | Flexible schemas for User.privacySettings, Device.specifications, HealthSample.metadata, SessionTelemetryCache.metricsJson — schema evolution without costly migrations |
| Decimal & BigInt | Precise numerics for Consumption.quantity, AiUsageRecord.totalCost, UserHealthWatermark.sequenceNumber — no floating-point inaccuracies |
| Partial unique indexes | Critical for idempotency (e.g., Purchase_userId_productId_active_unique) |
| Optimistic locking | version fields (@default(1)) for concurrency control across many models |
| TimescaleDB (inferred) | HealthSample table's time-series nature strongly implies hypertable, compression, and retention policies |
DatabaseService manages PrismaClient connections, transactions, and retry logic optimized for serverless environments (e.g., Neon cold starts).
Two strictly separated instances prevent eviction policy conflicts:
| Instance | Policy | Usage |
|---|---|---|
| API Cache | volatile-ttl |
Response caching (apiCache.middleware.ts), distributed advisory locks (e.g., SyncLeaseService, UserConsumptionProfileService.performEMALearning), rate limit counters (authRateLimit.middleware.ts, rateLimitQueue.middleware.ts) |
| BullMQ / WebSocket | noeviction |
BullMQ job data, queues, and locks (job-manager.service.ts); Socket.IO Redis adapter for horizontal WebSocket scaling (socket.service.ts) |
Durable blob storage via S3Service:
- User-uploaded content — Journal entry photos (
journal.service.ts) - System-generated artifacts — Analytics reports, user data exports
- Database backups —
DatabaseService.performPgDumpBackup()uploads PostgreSQL backups to S3
- Authentication — AWS Cognito as IdP via
CognitoService. JWT validation byauth.middleware.ts(HTTP) andsocket.service.ts(WebSocket). Federated Google OAuth supported - Authorization — RBAC enforced via
authorization.middleware.tsand controller-level guards (e.g.,createRequireAdmin).UserRepositorymaps Cognito IDs to internal user IDs for consistent authorization checks - Protection —
auth-monitoring.middleware.tslogs suspicious authentication patterns.authRateLimit.middleware.tsprovides brute-force protection, account lockout, and CAPTCHA integration viaAuthRateLimitService(Redis-backed)
| Concern | Implementation |
|---|---|
| Secrets | SecretsService fetches from AWS Secrets Manager; ConfigSecurityService validates, enforces minimum JWT secret lengths, and freezes configuration |
| HTTPS | httpsEnforcement.middleware.ts redirects HTTP to HTTPS in production, injects HSTS |
| Input validation | Strict Zod schemas (packages/shared/src/contracts/*, code-snippets/api/v1/schemas/*) at API boundaries via requestValidation.middleware.ts. Payload size limits enforced by jsonBodyParser.middleware.ts |
| Session integrity | sessionSecurity.middleware.ts performs device fingerprinting and X-Session-ID validation for anomaly detection |
| PHI redaction | ai-phi-redaction.service.ts strips Protected Health Information before AI model submission |
| Data integrity | Optimistic locking, transactional outbox, strict schema validation |
For detailed data integrity guarantees, see
DATA-INTEGRITY-GUARANTEES.MD.
| Pillar | Implementation |
|---|---|
| Logging | LoggerService — structured JSON, log levels, categories. Integrates with CloudWatchLogsService for centralized aggregation |
| Metrics | PerformanceMonitoringService — API response times, DB query durations, cache hit rates, job processing times. OpenTelemetry-compatible export |
| Tracing | CorrelationContextManager via correlationContext.middleware.ts — X-Correlation-ID propagation across HTTP, service calls, and async events |
| Health checks | /health, /health/rate-limit, /api/v1/monitoring/health — real-time service status and detailed metrics |
- Centralized & secure —
ConfigSecurityServiceloads from environment and AWS Secrets Manager (viaSecretsService) - Immutable — All
Configobjects frozen after loading (initializeConfig) to prevent runtime tampering - Validated — Zod schemas enforce security policies at startup (JWT secret strength, CORS origins in production)
- Shared —
packages/shareddefines common configuration contracts for frontend/backend consistency
| Integration | Service | Purpose |
|---|---|---|
| AWS Cognito | cognito.service.ts |
User identity management (user pools), authentication, token validation |
| AWS Secrets Manager | secrets.service.ts |
Secure storage and retrieval of API keys, DB credentials, sensitive config |
| AWS S3 | s3.service.ts |
Durable blob storage for user content, system reports, and database backups |
| AWS CloudWatch | cloudwatch-logs.service.ts |
Centralized log aggregation, monitoring, and analysis |
| OpenTelemetry SDK | instrumentation.ts |
Node.js instrumentation for traces and metrics collection |
| OpenTelemetry Collector | (inferred) | Agent for exporting traces (Jaeger/Datadog) and metrics (Prometheus/Grafana) |
| Google OAuth | auth.service.ts |
Federated authentication with server-side Google ID token validation |
| Anthropic AI | ai.service.ts |
LLM-powered journal analysis, weekly reports, product recommendations. Key securely loaded from Secrets Manager |
| HealthKit / Health Connect | (client-side only) | Mobile clients collect data via OS SDKs, then batch-upload to /health/samples/batch-upsert |
Architectural Trade-off Analysis
Core business logic and complex transformations are encapsulated in pure, deterministic, side-effect-free functions (e.g., computeBinIndex in temporal-pattern.service.ts, calculateQuantityFromDuration in personalized-consumption-rate.service.ts, computeProductImpactAggregate in product-impact-compute.ts, canonicalizePayload in payload-hash.ts). The "impure shell" (services, controllers) orchestrates these pure functions and handles I/O, state, and side effects. This enhances testability and predictability at the cost of maintaining a clear separation discipline.
bootstrap.ts centralizes all instantiation and wiring via constructor injection. This eliminates hidden dependencies and promotes explicit contracts. The trade-off is verbosity for very large applications — the entire dependency graph lives in one file.
Separating health write and read paths enables independent optimization of each for their specific workload characteristics. The trade-off is eventual consistency between raw data and projections, managed by the transactional outbox and watermark-based staleness detection.
Atomic event recording within the same transaction as primary data changes adds write-path complexity and requires a dedicated OutboxProcessorService in workers. The payoff is structurally eliminating the dual-write problem — strong data integrity guarantees without distributed transactions.
Centralizing definitions in packages/shared prevents semantic drift and ensures consistent behavior (unit conversions, entity types, conflict rules). The trade-off is strict adherence to shared contracts and potentially coordinated updates for breaking changes across packages.
Domain events via DomainEventService enable asynchronous processing and system extensibility — new subscribers can be added without modifying event producers. The trade-off is added complexity in tracing event flows (mitigated by correlation IDs) and ensuring delivery guarantees (mitigated by transactional outbox).
Services organized around business domains reduce cognitive load and enable independent evolution. The trade-off is potential overhead for cross-domain queries, often addressed by denormalized read models or specialized queries.
version fields prevent lost updates during concurrent sync. All critical writes are safely retryable (requestId + payloadHash for health; clientSyncOperationId for sync). The trade-off is conflict resolution complexity on both client and server, centralized in shared contracts to keep it manageable.
Gaps, Risks, and Future Work
code-snippets/models/dynamodb-schemas.ts— Remnant of a previous DynamoDB phase. All services now use PostgreSQL. Recommendation: Remove from codebase.
- Multi-device health sync —
healthSourceDeviceScopeflag exists inHealthSourceRegistry.ts, but comprehensive multi-device merging for health samples (beyond basic deduplication) is not fully implemented. Users with multiple health sources may experience inconsistencies. - Health Connect background delivery — Android background ingestion is explicitly deferred in
HealthKitContext.tsx. Android users may experience less frequent health data sync compared to iOS.
- Client token refresh — The explicit refresh loop within
HealthSyncServicewhen auth fails (401) is not fully visible.BackendAPIClientmay handle this transparently, but if not, retry loops with invalid tokens could cause unnecessary network traffic. - Endpoint-level idempotency — Core batch endpoints (
health/samples/batch-upsert,sync/push) have explicit idempotency. Standard CRUD endpoints may rely on simpler implicit guarantees. Recommendation: Conduct an idempotency audit across all endpoints. - Trace context propagation — OpenTelemetry is initialized and correlation IDs propagate via middleware, but ensuring context propagation across all async boundaries (e.g.,
OutboxProcessorServicedispatching) and external integrations requires continuous verification.
- TimescaleDB configuration —
HealthSamplestrongly implies TimescaleDB usage (hypertable, compression, retention), but explicit configuration indatabase.service.tsorbootstrap.tsis not documented. Without proper setup, time-series query performance could degrade at scale.
- ADRs — The
DECISIONS/folder structure implies Architectural Decision Records, but no actual ADR files are provided. Key architectural decisions should be formally documented to prevent architecture drift.