Skip to content

Latest commit

 

History

History
751 lines (533 loc) · 46.8 KB

File metadata and controls

751 lines (533 loc) · 46.8 KB

Backend Architecture

Overview

This document provides a comprehensive architecture overview of the AppPlatform backend — a cloud-native, event-driven Node.js/TypeScript platform built on Express.js and PostgreSQL. It serves as the central hub for health data ingestion, multi-device synchronization, asynchronous projection pipelines, AI-powered analytics, and real-time communication.

The backend is engineered for the real world, not the happy path. It handles unreliable mobile networks, extended offline periods, bursty health data ingestion, and concurrent multi-device modifications. The architecture prioritizes data integrity, idempotency, and explicit consistency tracking across all operating conditions.

Scope: This document covers the backend repository's internal architecture, external interfaces, and deployment topology. Client-side and firmware architectures are excluded unless they directly impact backend design.


Core Principles

1. Layered architecture with strict separation of concerns

A clear separation into API, Services, Repositories, and Persistence layers. Each layer has a single responsibility and communicates only with adjacent layers through well-defined interfaces.

Goal: Modularity, independent testability, and predictable change propagation.

2. Explicit dependency injection via a single composition root

bootstrap.ts is the sole composition root. Every service receives its dependencies through constructor injection. No service locators, no ambient singletons, no hidden coupling. The entire dependency graph is visible in one file.

Goal: Maximum testability, minimal coupling, and complete transparency of system wiring.

3. Event-driven architecture with transactional outbox

Domain events (DomainEventService) drive analytics, achievements, projections, and notifications via decoupled subscribers. The OutboxService writes events to a dedicated table within the same database transaction as the primary data change, guaranteeing at-least-once delivery.

Goal: No data change silently drops its downstream side effects.

4. CQRS for the health data pipeline

Write operations (raw HealthSample ingestion via HealthSampleService) are separated from read operations (pre-computed projections via HealthProjectionReadService). The write path is high-throughput and append-only. The read path serves fast, pre-computed results with explicit staleness via watermarks.

Goal: API reads are fast, writes are durable, and staleness is always explicit.

5. Idempotency at every trust boundary

Request-level idempotency uses requestId + payloadHash in a persistent tracking table. Sample-level deduplication relies on composite unique constraints. Sync operations use clientSyncOperationId with cached results. Retries are always safe.

Goal: Network unreliability, client bugs, and background job retries never produce duplicate data.

6. Fail-fast and fail-visible

Immediate error detection via explicit checks for invalid states, malformed data, and missing configurations. Errors are wrapped in AppError and logged with rich context via LoggerService. No silent failures, no swallowed exceptions.

Goal: Every failure is observable, traceable, and actionable.

7. Single source of truth via shared contracts

Critical definitions — health metric types, sync entity types, conflict resolution policies, API contracts — are centralized in packages/shared and consumed by both backend and mobile. No ad-hoc logic duplication.

Goal: Prevent semantic drift and ensure consistent behavior across the entire stack.


Key Architectural Decisions

Decision Rationale
PostgreSQL as canonical data store All data (including time-series via JSONB) consolidated into one store, simplifying persistence
Separate Web and Worker processes Independent scaling, resource isolation, distinct entry points (src/index.ts / src/worker.ts)
Three-lane health ingestion HOT (recent), COLD (backfill), CHANGE (deletions) — each with distinct QoS and freshness guarantees
Dedicated Redis for BullMQ noeviction policy for job durability, isolated from API cache's volatile-ttl policy
Cursor-based sync O(log n) keyset pagination, stable under concurrent writes, enabling incremental pulls
Config-driven conflict resolution Per-entity, per-field merge policies declared in shared contracts (conflict-configs.ts) — no ad-hoc merge logic

System Context & Boundaries

The AppPlatform backend operates within a broader ecosystem of clients, cloud infrastructure, and third-party services.

External Interactions

Category Systems Integration
Mobile Clients iOS, Android HTTP API, WebSocket, HealthKit/Health Connect data push
Web Clients Browser HTTP API, WebSocket
AWS Infrastructure Cognito, Secrets Manager, S3, CloudWatch, RDS/Neon, ElastiCache Auth, secrets, storage, logging, persistence, caching
Third-Party APIs Google OAuth, Anthropic AI Federated auth, LLM analysis and recommendations
Health SDKs Apple HealthKit, Google Health Connect Client-side collection, then batch upload — no direct backend integration

System Context and Boundary Diagram

Deployment Model

The backend deploys as two distinct process types:

  • Web Service (src/index.ts) — Handles synchronous HTTP API requests and WebSocket connections. Stateless, horizontally scalable behind a load balancer.
  • Worker Service (src/worker.ts) — Processes asynchronous BullMQ jobs (projections, analytics, reconciliation). Independent scaling and resource isolation from the web service.

Deployment Topology

Web Service

Multiple stateless instances run behind a load balancer. socket.service.ts uses @socket.io/redis-adapter for horizontal WebSocket scaling — events emitted from any instance reach all connected clients via Redis Pub/Sub.

Worker Service

Multiple instances consume jobs from shared BullMQ queues via job-manager.service.ts. Redis ensures job durability across worker crashes. Per-queue concurrency controls resource utilization.

Horizontal Scaling

Concern Strategy
API & WebSockets Stateless web instances + Redis adapter for Socket.IO
Background Jobs Multiple worker instances + BullMQ load distribution
Resource Isolation Web and worker processes are separate — heavy projections never block API latency

Persistence Topology

Store Role Configuration
PostgreSQL Canonical data store Managed (RDS/Neon), strong consistency, JSONB support. database.service.ts manages connections with serverless retry logic
Redis (API Cache) Response caching, rate limiting, distributed locks volatile-ttl eviction policy via cache.service.ts
Redis (BullMQ/WS) Job queues, Socket.IO adapter noeviction policy — strict separation prevents eviction conflicts
AWS S3 Blob storage Journal photos, analytics reports, database backups via s3.service.ts

Key insight: The architecture strictly separates Redis instances for API caching (volatile-ttl) and durable job queues (noeviction) to prevent eviction policy conflicts and ensure data integrity for background jobs.


Application Architecture

The backend follows a four-layer architecture with cross-cutting concerns spanning multiple layers.

graph TD
    Client["Client (Mobile/Web)"]

    subgraph BM["Backend Monolith"]
        A[1. API Layer] --> B[2. Services Layer]
        B --> C[3. Repositories Layer]
        C --> D[4. Persistence Layer]
    end

    Client -- HTTP/WS --> A

    subgraph CC["Cross-Cutting Concerns"]
        Security(Security)
        Observability(Observability)
        Config(Configuration)
        Async(Async Processing)
        Realtime(Realtime Comm.)
    end

    A -- "Cross-Cutting" --> Security
    A -- "Cross-Cutting" --> Observability
    B -- "Cross-Cutting" --> Async
    B -- "Cross-Cutting" --> Realtime
    C -- "Cross-Cutting" --> Observability

    D -- "Data Storage" --> PostgreSQL[(PostgreSQL)]
    D -- "Cache/Queues" --> Redis[(Redis)]
    D -- "Blob Storage" --> S3[(S3)]
Loading

API Layer

Located in code-snippets/api/v1/, this layer handles incoming requests, applies cross-cutting concerns, and routes to business logic.

Component Location Responsibility
Controllers code-snippets/api/v1/controllers/ Thin request handlers — extract params, call services, format responses, delegate errors
Middleware code-snippets/api/v1/middleware/ Cross-cutting concerns via MiddlewareFactory for consistent DI
Routes code-snippets/api/v1/routes/ Endpoint definitions with ordered middleware chains, registered via RouteRegistry in code-snippets/core/route-registry.ts
Schemas code-snippets/api/v1/schemas/ + packages/shared/src/contracts/ Zod schemas for request/response validation — canonical SSOT for API contracts
Middleware Stack
Middleware Purpose
correlationContext Generates and propagates X-Correlation-ID for end-to-end tracing
serverTime Injects Server-Time header for client clock offset calculation
apiGateway Circuit breakers and request/response transformations for external service calls
logging Request-specific logging contexts and request/response details
securityLogging Security context initialization and suspicious pattern detection
jsonBodyParser JSON parsing with inflated size limits enforcement
requestValidation Zod schema validation for params, query, and body
auth JWT validation via Cognito, populates req.user with authenticated context
userContext Extracts and validates user context from req.user
rateLimitQueue API rate limiting with request queuing
rate-limit-monitoring Rate limit metrics collection and response headers
apiCache Redis-based response caching with ETags, conditional requests, and race condition prevention
sessionSecurity Device fingerprinting and X-Session-ID validation for session integrity
health-error Contract-compliant error formatting for health batch endpoints

Services Layer

Located in code-snippets/services/, this layer encapsulates all core business logic. Services coordinate repositories, other services, and external integrations.

Service Catalog

Core Business Services

Service Domain
user.service.ts User management
consumption.service.ts Consumption tracking
session.service.ts Session management
purchase.service.ts Purchase management
journal.service.ts Journaling
product.service.ts Product catalog
inventory.service.ts Inventory management
goals.service.ts / achievements.service.ts Goals and achievements
safety.service.ts Safety and anomaly detection
analytics.service.ts Analytics and reporting

Health Data Pipeline

Service Responsibility
healthSample.service.ts API ingestion with dual-layer idempotency, triggers outbox events
healthIngestQueue.service.ts Large batch queuing for async worker processing
health-aggregation.service.ts ValueKind-aware aggregation logic for raw samples
health-projection-coordinator.service.ts Fan-out of health.samples.changed events to projection handlers
health-projection-read.service.ts Queries derived read models with watermark freshness overrides
health-insight-engine.service.ts Deterministic, evidence-backed insights at read time via pure rules

AI Integration

Service Responsibility
ai.service.ts Core orchestration for external AI providers (Anthropic)
ai-context-aggregation.service.ts Rich AI prompt construction from aggregated user data
ai-phi-redaction.service.ts PHI redaction before AI model submission
ai-product-recommendation.service.ts Personalized product recommendations
ai-journal-analysis.service.ts Journal entry analysis
ai-weekly-report.service.ts Weekly health reports
aiCostTracking.service.ts AI API usage and cost monitoring

User Profiling & ML

Service Responsibility
user-consumption-profile.service.ts EMA learning from purchase cycles, consumption prediction
temporal-pattern.service.ts Dirichlet-multinomial temporal histogram engine for routine detection
user-routine.service.ts User routine profile management
inventory-prediction.service.ts Multi-factor depletion prediction and purchase recommendations

Cross-System Concerns

Service Responsibility
auth.service.ts / cognito.service.ts Authentication orchestration, Cognito integration
cache.service.ts Redis-based caching and distributed advisory locks
logger.service.ts Structured, context-rich logging
securityLogger.service.ts Security event recording for audit and alerting
performanceMonitoring.service.ts Application performance metrics collection
outbox.service.ts / outbox-processor.service.ts Transactional outbox pattern and async event processing
sync.service.ts / syncLease.service.ts Bidirectional sync, conflict resolution, admission control
deviceTelemetry.service.ts Device-specific sensor data collection and retrieval
sessionTelemetryQueue.service.ts Session telemetry computation job queuing
job-manager.service.ts BullMQ job queue orchestration
secrets.service.ts Secure access to application secrets (AWS Secrets Manager)
correlationTracker.service.ts Distributed tracing correlation IDs
requestValidation.service.ts Utility functions for schema-based request validation
httpsValidation.service.ts HTTPS-related validation utilities

Repositories Layer

Located in code-snippets/repositories/, this layer abstracts all direct database access via Prisma ORM.

  • RepositoryFactory (repository.factory.ts) — Central mechanism for creating all repositories with consistent PrismaClient, LoggerService, and PerformanceMonitoringService injection.
  • Per-entity repositories (e.g., user.repository.ts, consumption.repository.ts, health-sample.repository.ts, sync-operation.repository.ts) — Hide ORM details, enforce userId-scoped authorization to prevent IDOR, implement optimistic locking via version fields, and encapsulate complex query logic.

Shared Contracts & Utilities

Package Purpose
code-snippets/core/ DI framework utilities — controller-registry.ts, middleware-factory.ts, route-registry.ts, common type definitions
code-snippets/utils/ Helpers for authentication (auth.utils.ts), error handling (error-handler.ts), crypto, and device ID generation
packages/shared/ Cross-application type safety — canonical entity types (entity-types.ts), Zod contracts (health.contract.ts, sync-lease.contract.ts), health config (metric types, units, normalization, precision, payload hashing), sync config (relation graph, cursor formats, conflict strategies)

Bootstrap & Dependency Injection

bootstrap.ts is the single composition root — the sole, authoritative place responsible for instantiating every service, wiring all dependencies via constructor injection, and orchestrating initialization and shutdown in the correct order.

Initialization Stages
Phase Components Purpose
0 OpenTelemetry SDK (instrumentation.ts) Tracing and metrics collection before any app code loads — instruments Node.js, Express, Redis, PostgreSQL
1 LoggerService, S3Service Foundational — LoggerService is a dependency for almost all other components
2 ConfigSecurityService, AuthConfig Load and validate all configuration, including secrets from AWS Secrets Manager. Immutable Config object frozen after loading
3 DatabaseService, CacheService, PerformanceMonitoringService, ProjectionCheckpointRepository, HealthProjectionCoordinatorService, HealthAggregationService, HealthProjectionReadService, HealthInsightEngineService Core infrastructure — database connections, caching, metrics, projection coordination
4 RepositoryFactory + all entity repositories Data access layer — PrismaClient from DatabaseService injected into all repositories
5 DomainEventService, OutboxService, CognitoService, SocketService, SecurityLoggerService, AuthRateLimitService, SessionSecurityService, SyncLeaseService Core platform services bridging infrastructure and business logic
6 ConsumptionService, JournalService, HealthSampleService, etc. Domain-specific business services
7 UserConsumptionProfileService, TemporalPatternService, InventoryPredictionService User profiling and ML services (depend on Phase 6 business services)
8 ControllerRegistry, MiddlewareFactory, RouteRegistry, Express App API layer assembly and route registration

Service Lifecycle

  • Constructor injection — Pure classes receiving all dependencies through constructors. Enforces explicit contracts and simplifies unit testing.
  • initialize() methods — Async setup (database connections, event handler registration, configuration loading) that cannot run in synchronous constructors.
  • shutdown() methods — Graceful cleanup of external resources (database connections, Redis clients, timers) orchestrated in reverse dependency order by bootstrap.ts.

Data Flows & Pipelines

HTTP Request Lifecycle

Incoming requests pass through an ordered middleware chain before reaching controller logic. The chain handles correlation tracking, security, authentication, rate limiting, caching, and validation.

HTTP Request Sequence Diagram
sequenceDiagram
    actor Client
    participant LoadBalancer as Load Balancer
    participant ExpressApp as Express App
    participant Middleware as Middleware Chain
    participant Controller
    participant Service
    participant Repository
    participant Database as PostgreSQL
    participant Cache as Redis Cache
    participant Logger as Logger Service
    participant PerfMon as Performance Monitoring

    Client->>+LoadBalancer: HTTP Request (GET /data)
    LoadBalancer->>+ExpressApp: Forward Request
    ExpressApp->>+Middleware: correlationContext
    Middleware->>Middleware: serverTime
    Middleware->>Middleware: apiGateway
    Middleware->>Middleware: requestLogging
    Middleware->>Middleware: securityContext
    Middleware->>Middleware: suspiciousPatterns
    Middleware->>Middleware: jsonBodyParser
    Middleware->>Middleware: requestValidation
    Middleware->>Middleware: auth.middleware (JWT Validation)
    Middleware->>Middleware: userContext (User Info)
    Middleware->>Middleware: rateLimitQueue (Check/Queue)
    Middleware->>Middleware: rate-limit-monitoring (Metrics/Headers)
    Middleware->>+Cache: apiCache (Lookup CacheKey)
    Cache-->>-Middleware: Cache Hit/Miss
    alt Cache Hit
        Middleware->>Client: 200/304 Cached Response
        Middleware->>PerfMon: Record CACHE_HIT
    else Cache Miss
        Middleware->>+Controller: Invoke Controller Method
        Controller->>+Service: Call Business Logic
        Service->>+Repository: Request Data
        Repository->>+Database: Query Data
        Database-->>-Repository: Query Result
        Repository-->>-Service: Data
        Service-->>-Controller: Processed Data
        Controller-->>+ExpressApp: Send Response
        ExpressApp->>Middleware: Intercept Response
        Middleware->>Cache: apiCache (Store Response)
        Middleware->>PerfMon: Record CACHE_MISS, API_RESPONSE_TIME
        Middleware->>Logger: Log API_RESPONSE
        Middleware-->>-LoadBalancer: Send Response (HTTP 200)
        LoadBalancer->>-Client: Final Response
    end
Loading

Key insight: On cache hit, responses are served without touching PostgreSQL. On cache miss, the full layer stack executes and the response is cached for subsequent requests.


Health Data Ingestion

The ingestion pipeline moves health data from mobile devices through the backend's trust boundary into PostgreSQL, with transactional outbox events for downstream projection processing.

Health Data Ingestion Pipeline

Client-Side Ingestion (packages/app):

Component Responsibility
HealthKitContext.tsx HealthKit permissions and raw data access
HealthKitService.ts Abstracts HealthKit operations (availability, permissions, data reads)
HealthKitAdapter.ts Implements HealthDataProviderAdapter for iOS HealthKit — metric configurations and category code mapping
DirtyKeyComputer.ts Computes dirty projection keys from ingested samples
HealthIngestionEngine.ts Three-lane pipeline (HOT for recent UI data, COLD for historical backfill, CHANGE for deletions/edits) — enforces cursor updates, validation, and idempotent writes
HealthSyncCoordinationState.ts Manages ingestion intervals, backoff, and single-operation enforcement
HealthUploadEngine.ts Batch staging with staged_batch_id for atomic claims and crash recovery, serialization, payload size limits
HealthUploadHttpClientImpl.ts HTTP client wrapping BackendAPIClient for auth and retry

Backend Ingestion (packages/backend):

Component Responsibility
health.controller.ts Receives POST /health/samples/batch-upsert, delegates to healthIngestQueue.service.ts for async and healthSample.service.ts for idempotency
healthIngestQueue.service.ts Pre-processing queue — enqueues large batches into BullMQ to prevent HTTP timeouts
HealthSampleService.ts Core ingestion logic (see details below)
HealthSampleRepository Persists HealthSample entities to PostgreSQL
HealthIngestRequest table Tracks request-level idempotency — ensures same batch isn't processed twice

HealthSampleService internals:

  • Two-layer idempotency: requestId + payloadHash for request-level (HealthIngestRequest table), and (userId, sourceId, sourceRecordId, startAt) unique constraint for sample-level deduplication
  • Privacy gating: Enforces allowHealthDataUpload and blockedMetrics from user privacy settings (via UserRepository)
  • Unit normalization: Ensures all values and units conform to canonical forms (unit-normalization.ts, metric-types.ts)
  • Transactional outbox: Triggers health.samples.changed events via OutboxService within the same transaction as HealthSample persistence

Guarantee: Zero dual-writes. Raw samples and outbox events are committed in a single atomic transaction.

For detailed ingestion logic, three-lane contract, and backend processing, see HEALTH-INGESTION-PIPELINE.MD.


Health Projection Pipeline

After ingestion, raw health data changes are asynchronously transformed into pre-computed read models with watermark-based freshness tracking. This implements the read side of the CQRS pattern.

Health Projection Pipeline

Projection Handlers:

Handler Output Table Logic
HealthRollupProjectionHandler user_health_rollup_day Daily numeric aggregates (avg heart rate, total steps) via HealthAggregationService for ValueKind-aware logic
SleepSummaryProjectionHandler user_sleep_night_summary Nightly sleep summaries — clustering, nap detection, canonical source selection
SessionImpactProjectionHandler user_session_impact_summary Before/during/after health metric deltas around consumption sessions
ProductImpactProjectionHandler user_product_impact_rollup Per-product impact aggregation (depends on session-impact completing)
TelemetryCacheProjectionHandler session_telemetry_cache Marks entries STALE when underlying samples change — triggers lazy recomputation on read

Coordination Mechanisms:

  • Checkpoint trackingProjectionCheckpointRepository tracks per-projection status (PENDINGPROCESSINGCOMPLETED/FAILED), enabling idempotent retries and independent failure isolation
  • Lease-based concurrency — Prevents multiple workers from processing the same projection for the same event
  • Watermark freshnessHealthProjectionReadService overrides a row's status to STALE if its sourceWatermark is behind the current UserHealthWatermark
  • Event coalescingoutbox-coalescing.ts groups related events for processing efficiency
  • Read-time insightshealth-insight-engine.service.ts generates deterministic insights from projection data using pure rule-based logic (health-insight-rules.ts)

Guarantee: Individual projection failures don't block other projections. Watermarks provide explicit freshness tracking — no silent staleness.

For detailed projection coordination, checkpointing, and watermark logic, see PROJECTION-PIPELINE.MD.


Data Synchronization

The sync pipeline enables bidirectional data synchronization between multiple offline-first mobile clients and the backend, with config-driven conflict resolution and structural integrity enforcement.

Bidirectional Sync Architecture

Key Mechanisms:

Mechanism Implementation
Offline-first Clients create and modify data offline using client-generated UUIDs (clientConsumptionId, clientEntryId)
Cursor-based pull CompositeCursor from packages/shared/src/sync-config/cursor.ts, topologically sorted by ENTITY_SYNC_ORDER (parents before children) to prevent FK violations
Push idempotency syncOperationId + cached resultPayload in SyncOperation table — repeated pushes return cached results without reprocessing
Optimistic locking version fields on syncable entities detect conflicts when clientVersion < serverVersion
Conflict resolution Config-driven via ENTITY_CONFLICT_CONFIG from conflict-configs.ts — strategies: LAST_WRITE_WINS, CLIENT_WINS, SERVER_WINS, MERGE, MANUAL
Field-level policies For MERGE strategy: LOCAL_WINS (notes), MERGE_ARRAYS (tags), MAX_VALUE (counters), MONOTONIC (status). Pure merge via entity-merger.ts
Entity handlers code-snippets/services/sync/handlers/ — entity-specific create(), update(), delete(), fetchServerVersion(), merge() with Zod validation and transactional writes
Change tracking Every mutation writes to SyncChange table (source for pull sync). SyncState tracks lastSyncCursor per (userId, deviceId)
Distributed locks SyncLeaseService (Redis) for admission control + SyncStateRepository (PostgreSQL) for per-device serialization via lockOwner/lockAcquiredAt

Guarantee: Cursors never advance past corrupted state. Conflict resolution is deterministic, auditable, and testable in isolation.

For detailed sync mechanics, entity handlers, and conflict algorithms, see SYNC-ENGINE.MD.


Async Job Processing

A dedicated worker process handles all background tasks, isolating compute-heavy operations from the web service's request path.

Backend Async Job Architecture

Key Components:

Component Responsibility
JobManagerService Manages BullMQ Queue and Worker instances. Dedicated Redis with noeviction. Provides enqueueJob() for producers, getQueueDepth() for backpressure. Per-queue config (maxQueueLen, concurrency, removeOnComplete/Fail)
JobProcessor Core dispatch logic in worker processes — delegates to specific services based on JobNames enum. Fault-isolated via try/catch with structured error logging
schedules.ts Recurring cron jobs via BullMQ repeatable jobs — analytics refresh, health ingest reaper, inventory reconciliation, stale session reconciliation

Job Types:

Job Purpose
HEALTH_INGEST_BATCH Async processing of large health uploads
SESSION_TELEMETRY_COMPUTE Derived health data for session visualizations
REFRESH_ANALYTICS_MVS PostgreSQL materialized view refresh for analytics
HEALTH_INGEST_REAPER Stale ingestion request cleanup
HEALTH_SAMPLE_SOFT_DELETE_PURGER Hard-delete old soft-deleted samples
INVENTORY_RECONCILIATION Link unlinked consumptions to inventory items
STALE_SESSION_RECONCILIATION Complete sessions whose end time has passed

Guarantee: Jobs survive worker crashes via durable Redis queues. All jobs are idempotent and retry-safe.

For BullMQ configuration, concurrency control, and failure modes, see WORKER-SCALABILITY.MD.


Real-time Communication

The backend supports real-time updates via Socket.IO WebSockets for live session events, sync notifications, and instant UI feedback.

Connection Flow:

  1. Mobile/Web clients establish WebSocket connections to the backend's web service instances
  2. JWT authenticated via CognitoService.validateToken(), Cognito sub mapped to internal userId for consistency with HTTP API
  3. Authenticated sockets join rooms: user:{userId}, device:{deviceId}, session:{sessionId}
  4. Domain events (e.g., consumption.created, session.ended) emitted by DomainEventService
  5. WebSocketEventSubscriber converts events to minimal, type-safe RealtimeEnvelopeV1 (from packages/shared/src/realtime/contracts/events.ts) — includes userId, entityId, eventType, and cache-patch payload
  6. WebSocketBroadcaster emits to appropriate rooms via socket.service.ts
  7. Clients patch local React Query caches directly from the payload — no full API refetch

Reliability:

Concern Mechanism
Horizontal scaling @socket.io/redis-adapter on the dedicated BullMQ/WS Redis instance broadcasts events across all web instances
Authentication JWT validation via CognitoService on every connection
Tenancy enforcement WebSocketBroadcaster validates userId match — no cross-user data leakage
Deduplication dedupeCache keyed by (eventType, entity, entityId) prevents duplicate emissions during failover/retry

Data Persistence & Storage

PostgreSQL

The canonical, primary, and single source of truth for all application data. Managed via Prisma ORM with schema defined in prisma/schema.prisma.

Schema Categories
Category Tables
Core Business User, Consumption, ConsumptionSession, JournalEntry, Purchase, Product, InventoryItem, Goal, Achievement, UserAchievement, Device, SafetyRecord
Sync Metadata SyncOperation, SyncConflict, SyncChange, SyncState
Event Outbox OutboxEvent (central to the Transactional Outbox Pattern)
Health Data HealthSample (raw samples — time-series, likely optimized with TimescaleDB), HealthIngestRequest (idempotency tracking)
Health Projections UserHealthRollupDay, UserSleepNightSummary, UserSessionImpactSummary, UserProductImpactRollup (optimized read models)
AI AiUsageRecord, AiChatThread, AiChatMessage, AiAnalysis, AiRecommendationSet, AiRecommendationItem, AiResponseCache
Telemetry SessionTelemetryCache (precomputed session health data), UserHealthWatermark, ProjectionCheckpoint
Real-time WebSocketEvent, LiveConsumption, SessionMessage (raw events for audit/reconciliation)

Key PostgreSQL Features:

Feature Usage
JSONB columns Flexible schemas for User.privacySettings, Device.specifications, HealthSample.metadata, SessionTelemetryCache.metricsJson — schema evolution without costly migrations
Decimal & BigInt Precise numerics for Consumption.quantity, AiUsageRecord.totalCost, UserHealthWatermark.sequenceNumber — no floating-point inaccuracies
Partial unique indexes Critical for idempotency (e.g., Purchase_userId_productId_active_unique)
Optimistic locking version fields (@default(1)) for concurrency control across many models
TimescaleDB (inferred) HealthSample table's time-series nature strongly implies hypertable, compression, and retention policies

DatabaseService manages PrismaClient connections, transactions, and retry logic optimized for serverless environments (e.g., Neon cold starts).

Redis

Two strictly separated instances prevent eviction policy conflicts:

Instance Policy Usage
API Cache volatile-ttl Response caching (apiCache.middleware.ts), distributed advisory locks (e.g., SyncLeaseService, UserConsumptionProfileService.performEMALearning), rate limit counters (authRateLimit.middleware.ts, rateLimitQueue.middleware.ts)
BullMQ / WebSocket noeviction BullMQ job data, queues, and locks (job-manager.service.ts); Socket.IO Redis adapter for horizontal WebSocket scaling (socket.service.ts)

AWS S3

Durable blob storage via S3Service:

  • User-uploaded content — Journal entry photos (journal.service.ts)
  • System-generated artifacts — Analytics reports, user data exports
  • Database backupsDatabaseService.performPgDumpBackup() uploads PostgreSQL backups to S3

Cross-Cutting Concerns

Authentication & Authorization

Security & Compliance

Concern Implementation
Secrets SecretsService fetches from AWS Secrets Manager; ConfigSecurityService validates, enforces minimum JWT secret lengths, and freezes configuration
HTTPS httpsEnforcement.middleware.ts redirects HTTP to HTTPS in production, injects HSTS
Input validation Strict Zod schemas (packages/shared/src/contracts/*, code-snippets/api/v1/schemas/*) at API boundaries via requestValidation.middleware.ts. Payload size limits enforced by jsonBodyParser.middleware.ts
Session integrity sessionSecurity.middleware.ts performs device fingerprinting and X-Session-ID validation for anomaly detection
PHI redaction ai-phi-redaction.service.ts strips Protected Health Information before AI model submission
Data integrity Optimistic locking, transactional outbox, strict schema validation

For detailed data integrity guarantees, see DATA-INTEGRITY-GUARANTEES.MD.

Observability

Pillar Implementation
Logging LoggerService — structured JSON, log levels, categories. Integrates with CloudWatchLogsService for centralized aggregation
Metrics PerformanceMonitoringService — API response times, DB query durations, cache hit rates, job processing times. OpenTelemetry-compatible export
Tracing CorrelationContextManager via correlationContext.middleware.tsX-Correlation-ID propagation across HTTP, service calls, and async events
Health checks /health, /health/rate-limit, /api/v1/monitoring/health — real-time service status and detailed metrics

Configuration Management

  • Centralized & secureConfigSecurityService loads from environment and AWS Secrets Manager (via SecretsService)
  • Immutable — All Config objects frozen after loading (initializeConfig) to prevent runtime tampering
  • Validated — Zod schemas enforce security policies at startup (JWT secret strength, CORS origins in production)
  • Sharedpackages/shared defines common configuration contracts for frontend/backend consistency

External Integrations

Integration Service Purpose
AWS Cognito cognito.service.ts User identity management (user pools), authentication, token validation
AWS Secrets Manager secrets.service.ts Secure storage and retrieval of API keys, DB credentials, sensitive config
AWS S3 s3.service.ts Durable blob storage for user content, system reports, and database backups
AWS CloudWatch cloudwatch-logs.service.ts Centralized log aggregation, monitoring, and analysis
OpenTelemetry SDK instrumentation.ts Node.js instrumentation for traces and metrics collection
OpenTelemetry Collector (inferred) Agent for exporting traces (Jaeger/Datadog) and metrics (Prometheus/Grafana)
Google OAuth auth.service.ts Federated authentication with server-side Google ID token validation
Anthropic AI ai.service.ts LLM-powered journal analysis, weekly reports, product recommendations. Key securely loaded from Secrets Manager
HealthKit / Health Connect (client-side only) Mobile clients collect data via OS SDKs, then batch-upload to /health/samples/batch-upsert

Design Trade-offs

Architectural Trade-off Analysis

Pure Functions / Impure Shell

Core business logic and complex transformations are encapsulated in pure, deterministic, side-effect-free functions (e.g., computeBinIndex in temporal-pattern.service.ts, calculateQuantityFromDuration in personalized-consumption-rate.service.ts, computeProductImpactAggregate in product-impact-compute.ts, canonicalizePayload in payload-hash.ts). The "impure shell" (services, controllers) orchestrates these pure functions and handles I/O, state, and side effects. This enhances testability and predictability at the cost of maintaining a clear separation discipline.

Composition Root

bootstrap.ts centralizes all instantiation and wiring via constructor injection. This eliminates hidden dependencies and promotes explicit contracts. The trade-off is verbosity for very large applications — the entire dependency graph lives in one file.

CQRS

Separating health write and read paths enables independent optimization of each for their specific workload characteristics. The trade-off is eventual consistency between raw data and projections, managed by the transactional outbox and watermark-based staleness detection.

Transactional Outbox

Atomic event recording within the same transaction as primary data changes adds write-path complexity and requires a dedicated OutboxProcessorService in workers. The payoff is structurally eliminating the dual-write problem — strong data integrity guarantees without distributed transactions.

Single Source of Truth

Centralizing definitions in packages/shared prevents semantic drift and ensures consistent behavior (unit conversions, entity types, conflict rules). The trade-off is strict adherence to shared contracts and potentially coordinated updates for breaking changes across packages.

Event-Driven Architecture

Domain events via DomainEventService enable asynchronous processing and system extensibility — new subscribers can be added without modifying event producers. The trade-off is added complexity in tracing event flows (mitigated by correlation IDs) and ensuring delivery guarantees (mitigated by transactional outbox).

Bounded Contexts

Services organized around business domains reduce cognitive load and enable independent evolution. The trade-off is potential overhead for cross-domain queries, often addressed by denormalized read models or specialized queries.

Optimistic Locking & Idempotency

version fields prevent lost updates during concurrent sync. All critical writes are safely retryable (requestId + payloadHash for health; clientSyncOperationId for sync). The trade-off is conflict resolution complexity on both client and server, centralized in shared contracts to keep it manageable.


Known Gaps & Evolving Areas

Gaps, Risks, and Future Work

Deprecated Components

  • code-snippets/models/dynamodb-schemas.ts — Remnant of a previous DynamoDB phase. All services now use PostgreSQL. Recommendation: Remove from codebase.

Incomplete Features

  • Multi-device health synchealthSourceDeviceScope flag exists in HealthSourceRegistry.ts, but comprehensive multi-device merging for health samples (beyond basic deduplication) is not fully implemented. Users with multiple health sources may experience inconsistencies.
  • Health Connect background delivery — Android background ingestion is explicitly deferred in HealthKitContext.tsx. Android users may experience less frequent health data sync compared to iOS.

Unclear / Inferred Behaviors

  • Client token refresh — The explicit refresh loop within HealthSyncService when auth fails (401) is not fully visible. BackendAPIClient may handle this transparently, but if not, retry loops with invalid tokens could cause unnecessary network traffic.
  • Endpoint-level idempotency — Core batch endpoints (health/samples/batch-upsert, sync/push) have explicit idempotency. Standard CRUD endpoints may rely on simpler implicit guarantees. Recommendation: Conduct an idempotency audit across all endpoints.
  • Trace context propagation — OpenTelemetry is initialized and correlation IDs propagate via middleware, but ensuring context propagation across all async boundaries (e.g., OutboxProcessorService dispatching) and external integrations requires continuous verification.

Operational Risks

  • TimescaleDB configurationHealthSample strongly implies TimescaleDB usage (hypertable, compression, retention), but explicit configuration in database.service.ts or bootstrap.ts is not documented. Without proper setup, time-series query performance could degrade at scale.

Missing Documentation

  • ADRs — The DECISIONS/ folder structure implies Architectural Decision Records, but no actual ADR files are provided. Key architectural decisions should be formally documented to prevent architecture drift.