This document defines the architectural contract for data integrity, concurrency control, and consistency in the AppPlatform backend. It details the precise mechanisms used to prevent data loss, duplication, race conditions, and corruption in a highly concurrent, offline-first, distributed environment.
The backend data integrity model is built on four core principles:
- Fail-Fast at Trust Boundaries: Invalid, unauthenticated, or malformed data is rejected at the earliest possible middleware layer, preventing it from reaching business logic.
- ACID for Primary Mutations: All state-changing operations on core domain entities (Consumptions, Purchases, Sessions) are wrapped in strict PostgreSQL transactions.
- Eventual Consistency for Projections: Analytical rollups, aggregations, and derived health data are computed asynchronously, decoupled from the critical write path.
- No Dual-Writes: Emitting events to external systems (WebSockets, Analytics, AI queues) directly from business logic is strictly prohibited. All domain events are staged via a Transactional Outbox.
Integrity begins before data reaches the database. The system enforces strict validation at the API boundary.
-
Zod Strictness: All API payloads are validated using Zod schemas configured with
.strict(). This prevents parameter pollution and ensures that unrecognized fields are explicitly rejected rather than silently ignored. -
Cryptographic Payload Integrity: Health sync operations (e.g.,
/health/samples/batch-upsert) enforce request integrity via a SHA-256payloadHash. ThevalidateBatchUpsertRequestWithHashmiddleware recomputes the hash of the incoming canonicalized payload and strictly rejects mismatches (PAYLOAD_HASH_MISMATCH), preventing in-transit tampering or client-side serialization bugs. -
Deterministic Normalization: Inputs are normalized deterministically before processing. Timezone offsets (
timezoneOffsetMinutes) are extracted to ensure precise UTC-to-local date bucketing, and platform-specific units (e.g., HealthKit'sdegC) are resolved to canonical SI units (°C) viatryNormalizeToCanonicalUnit.
Guarantee: No unvalidated, unnormalized, or tampered payload reaches the persistence layer. Validation failures are surfaced immediately with structured error codes.
For core entity mutations, the system relies on PostgreSQL's ACID guarantees via Prisma.$transaction.
Complex workflows that span multiple entities are executed as a single atomic unit. A prime example is PurchaseService.endAndCreatePurchase:
- Verifies the old purchase exists and is active.
- Updates the old purchase's
finishedDate. - Backfills and zero-outs corresponding
InventoryItemrecords. - Creates a new
Purchaserecord. - Stages a
purchase.finishedandpurchase.createdevent into the outbox.
If any step fails (e.g., due to an inventory validation error), the entire operation rolls back. The user is never left in a partial state (e.g., an ended purchase without a new active one).
Database-level constraints are used as a final safety net. Unique constraint violations (P2002) are caught and mapped to semantic application errors or used to trigger retry loops. For example, a P2002 on the partial unique index Purchase_userId_productId_active_unique is intercepted and transformed into an ACTIVE_PURCHASE_EXISTS error, prompting the frontend to safely resolve the state.
Guarantee: All multi-entity mutations are atomic. Partial state is structurally impossible — every compound operation either commits completely or rolls back entirely.
To reliably notify downstream consumers (analytics, WebSockets, projection builders) without falling victim to the "Dual-Write" distributed systems problem, the backend utilizes the Transactional Outbox Pattern.
Business services never call domainEventService.emitEvent() directly after a mutation. Instead, they pass the active transaction client tx to outboxService.addEvent(tx, payload). The event is inserted into the outbox_events table in the exact same transaction as the primary mutation.
The OutboxProcessorService runs as a background worker, polling the outbox_events table for PENDING records. It guarantees at-least-once delivery. If the server crashes immediately after the database commits but before the WebSocket broadcaster fires, the outbox processor will resume the event upon restart. Downstream subscribers must be idempotent.
Outbox Delivery Sequence Diagram
sequenceDiagram
participant Client
participant API as API / Service
participant DB as PostgreSQL (TX)
participant Outbox as Outbox Processor
participant Subs as Subscribers (WS/Projections)
Client->>API: POST /consumptions
activate API
API->>DB: BEGIN TX
API->>DB: INSERT consumption
API->>DB: INSERT outbox_events (consumption.created)
DB-->>API: COMMIT TX
deactivate API
API-->>Client: 200 OK
Note over Outbox,Subs: Asynchronous Delivery
activate Outbox
Outbox->>DB: SELECT * FROM outbox_events WHERE status = 'PENDING'
Outbox->>Subs: Process Event
Subs-->>Outbox: Ack
Outbox->>DB: UPDATE outbox_events SET status = 'COMPLETED'
deactivate Outbox
Guarantee: Zero dual-writes. The primary mutation and its domain event are committed in a single atomic transaction. If the process crashes after commit, the outbox processor resumes delivery on restart.
Idempotency ensures that retried network requests or duplicated outbox events do not result in corrupted or duplicated state.
Large batch operations utilize a dedicated HealthIngestRequest table to track request states.
Request Idempotency State Machine
stateDiagram-v2
[*] --> NEW_REQUEST : First seeing Request ID
NEW_REQUEST --> PROCESSING : Lock acquired
PROCESSING --> CACHED_RESPONSE : Success
PROCESSING --> FAILED : Unrecoverable Error
state "Duplicate Request" as Dup {
CACHED_RESPONSE --> Return_200 : Return cached result
PROCESSING --> Return_409 : "Still Processing"
FAILED --> NEW_REQUEST : Allow Retry
}
If a client retries a request with the exact same requestId and payloadHash, the system identifies it as CACHED_RESPONSE and returns the exact JSON response of the original successful request without re-processing the data.
For offline-first creation, clients generate UUIDs (e.g., clientConsumptionId, clientPurchaseId). The backend defines unique constraints on (userId, clientConsumptionId). If a sync push is interrupted and re-sent, INSERT ... ON CONFLICT safely skips or merges the duplicate record rather than creating a second consumption.
Guarantee: Network unreliability, client retries, and background job replays never produce duplicate data. Idempotency is enforced at both the request level and the individual entity level.
The backend employs a three-tier locking strategy depending on the scale and isolation requirements of the operation.
| Strategy Type | Mechanism | Primary Use Case | Codebase Example |
|---|---|---|---|
| Optimistic | @version column + Retry Loop |
High-throughput, low-contention entity updates | UserRoutineProfileRepository.update (retries up to 3 times on P2002/P2034) |
| Pessimistic / Advisory | pg_advisory_xact_lock |
Serializing operations on specific entity keys within a single database instance | SyncService.createSyncChangeRecord (prevents syncVersion collisions) |
| Distributed | Redis SET NX PX |
Cross-node exclusion for long-running or computationally expensive async jobs | CacheService.acquireLock (used in EMA learning performEMALearning & Session Telemetry) |
Note: Distributed locks include a TTL (
PX) to prevent permanent deadlocks if the worker node crashes while holding the lock.
Health dashboards, charts, and product impacts are derived asynchronously from raw samples to keep the ingestion path fast.
To prevent the frontend from displaying stale data silently, the system uses a monotonic UserHealthWatermark sequence number.
- When a raw health sample is ingested, the user's watermark sequence increments.
- Projection rows (e.g.,
UserHealthRollupDay) store asourceWatermarkrepresenting the sequence number they were computed against. - Read-Time Enforcement: During a
GETrequest, theHealthProjectionReadServicerunsapplyWatermarkFreshness(). If the user's current watermark is greater than the row'ssourceWatermark, the data is intercepted and its status is overridden toSTALE. The UI reacts by showing an "Updating..." badge while the background outbox processor catches up.
The HealthProjectionCoordinatorService uses independent checkpoints for each projection handler (e.g., health-rollup, sleep-summary). If the health-rollup handler times out or fails, its checkpoint is marked FAILED, but sleep-summary continues processing. On the next outbox retry, only the FAILED handlers are re-run, ensuring idempotency and fault isolation.
Guarantee: Staleness is explicit, never silent. Watermark comparison at read-time ensures the frontend always knows whether data is current, stale, or computing. Projection failures are isolated per-handler — one failure never blocks another.
The offline-first Sync Engine coordinates data merging between devices and the server.
Conflicts are resolved deterministically using a configured strategy defined in sync-config/conflict-configs.ts:
LAST_WRITE_WINS: Used for user-editable timestamps. ComparesversionorupdatedAt.LOCAL_WINS: Client is strictly authoritative (e.g., usernoteson a consumption).SERVER_WINS: Server is strictly authoritative (e.g., backend-derived aggregates).MERGE: Deep merges arrays (e.g., Journaltags) usingMERGE_ARRAYSpolicy, creating a deduplicated union.
AppPlatform uses the PRIMARY_KEY_IS_SERVER_ID strategy. When a client pushes a new entity, it supplies a temporary client UUID. The backend inserts the record and may assign a new Server UUID. The response payload maps clientId -> serverId. Any foreign keys (e.g., a Consumption referencing a Session) pushed in the same batch are resolved in-memory (resolveMappedIdsInChange) to ensure referential integrity before insertion.
Guarantee: Every conflict outcome is reproducible, auditable, and config-driven. Foreign key integrity is maintained across ID translation boundaries within a single push batch.
To preserve audit trails and sync tombstoning, entities like HealthSample are soft-deleted (isDeleted = true). However, to prevent unbounded storage growth, a background worker (healthSampleSoftDeletePurger) periodically hard-deletes records that have exceeded the 30-day retention period. This shifts heavy deletion I/O out of the critical path.
If an outbox event fails repeatedly and exceeds maxRetries (default 3, or 5 for critical health events), the OutboxProcessorService transitions its status to DEAD_LETTER. These events cease blocking the queue and remain safely isolated for manual engineering triage or automated administrative replay.
Guarantee: Failed events never block the processing pipeline. Dead-lettered events are preserved for forensic analysis and safe replay — no data is silently dropped.
For related architectural context, see System Architecture, Failure Modes & Resilience, and Health Ingestion Pipeline.