Skip to content

RFC: Add a first-class self-hosted Docker runtime while preserving the current Cloudflare deployment #2

@kavinsood

Description

@kavinsood

Summary

This RFC proposes adding a first-class self-hosted server runtime for YAOS, packaged for Docker, while preserving the current
Cloudflare Worker deployment as the default zero-ops path.

The key architectural decision is:

  • Keep Cloudflare as the canonical zero-terminal deployment.
  • Add a separate Node-based self-host runtime for Docker/public VM/homelab users.
  • Do not treat y-partyserver as the universal server abstraction.
  • Instead, define YAOS-owned core abstractions for room lifecycle, persistence, blobs, snapshots, and auth, then implement:
    • a Cloudflare adapter using y-partyserver + Durable Objects
    • a Node adapter using WebSocket + SQLite + local disk

This gives us vendor portability without destabilizing the current Cloudflare path.

Motivation

YAOS already solves the main problem for most users:

  • zero-ops deployment
  • real-time sync
  • no full database stack
  • good mobile story through Cloudflare-managed TLS and routing

However, there is a meaningful self-hosted segment that specifically wants:

  • a server they physically own
  • air-gapped or private-network deployments
  • a Docker image they can run on a VM, NAS, or homelab box
  • a path that does not depend on Cloudflare as the operator

This is not just a deployment preference. For part of the Obsidian self-host community, “self-deployed on Cloudflare” and “self-hosted
on my own machine” are not equivalent.

Problem Statement

The current server is portable in some places, but not portable as a whole.

Portable pieces already exist:

  • the plugin mostly speaks a generic HTTP/WebSocket contract
  • the CRDT/persistence model is YAOS-owned
  • the chunked checkpoint+journal store is already abstracted over a storage interface

Cloudflare-specific pieces are still deeply embedded:

  • room addressing and room ownership are Durable Object-shaped
  • y-partyserver is built on Durable Objects
  • claim/config storage uses a singleton DO
  • room debug tracing is stored in DO storage
  • snapshots/blobs use R2 directly
  • the setup/docs/UI are explicitly Cloudflare-first

So this is not “put the current server in Docker.”
It is “build a second runtime for the same YAOS server contract.”

Current Architecture Reality

Relevant grounding in the repo:

  • Main Worker routing and auth: server/src/index.ts
  • Per-vault room server on YServer: server/src/server.ts
  • Chunked checkpoint+journal persistence: server/src/chunkedDocStore.ts
  • Claim/config singleton storage: server/src/config.ts
  • Snapshots and blob key model: server/src/snapshot.ts
  • Client provider construction: src/sync/vaultSync.ts:233
  • Current Cloudflare-first settings/docs: src/settings.ts, README.md, server/README.md

Also worth noting:

  • npm run test:integration:worker currently passes end-to-end for the Cloudflare runtime.

Goals

  • Provide an official self-hosted deployment path for YAOS.
  • Support Docker deployment on public VMs and homelabs.
  • Preserve the current plugin UX and wire protocol as much as possible.
  • Preserve the current Cloudflare deployment unchanged for users who want zero-ops.
  • Minimize risk to the current stable path.
  • Make the server architecture explicitly runtime-agnostic at the core layer.

Non-Goals

  • Replacing Cloudflare as the recommended default deployment.
  • Supporting multi-node/distributed self-hosting in v1.
  • Building a new actor framework or replatforming to Rust/Rivet.
  • Solving arbitrary custom reverse proxy setups beyond documented best-effort guidance.
  • Achieving perfect implementation symmetry between Cloudflare and self-host runtimes.

Core Architectural Decision

Decision

Use a dual-runtime architecture.

Why

y-partyserver is a Durable Object abstraction, not a generic app server abstraction. It should remain the Cloudflare adapter, not
become the center of the entire portability strategy.

The portability boundary should be YAOS-owned and include:

  • room registry / room ownership
  • room persistence
  • claim/config storage
  • blob storage
  • snapshot storage
  • auth and schema gating
  • debug/trace hooks

Consequence

We should preserve:

  • the plugin’s HTTP/WS contract
  • the room semantics
  • the CRDT/persistence invariants

We should not preserve:

  • Durable Object runtime assumptions as a cross-platform abstraction

Proposed Design

1. Introduce a core server layer

Create runtime-agnostic interfaces for:

  • RoomStore
  • RoomRegistry
  • ConfigStore
  • BlobStore
  • SnapshotStore
  • TraceStore
  • AuthService

The core layer should own:

  • token verification
  • claim semantics
  • schema-version checks
  • room load/save policy
  • checkpoint+journal persistence policy
  • blob existence/upload/download semantics
  • snapshot creation/list/download semantics

2. Keep Cloudflare as an adapter

The existing Worker runtime becomes runtime-cloudflare.

It keeps:

  • y-partyserver
  • Durable Objects
  • R2
  • current setup flow
  • current deploy button story

This path should change only as needed to plug into core.

3. Add a Node self-host adapter

Create runtime-node using:

  • Node LTS
  • SQLite for room/config persistence
  • local filesystem for blobs and snapshot payloads
  • standard WebSocket server for /vault/sync/:vaultId
  • standard HTTP routes matching the current contract

Important constraint for v1:

  • single process
  • single instance
  • one authoritative in-memory room owner per vault

No horizontal scaling in v1.

4. Add a tiny provider abstraction in the plugin

Right now the plugin constructs YSyncProvider in one place.
We should wrap that in a small local interface so the client can swap transports later if needed.

Target state:

  • Cloudflare path still uses y-partyserver/provider
  • self-host path can use a compatible provider or custom adapter later
  • plugin business logic stays unchanged

Networking and Deployment Strategy

The main lesson here is:

  • Docker is easy
  • secure mobile networking is the real product problem

Public VM deployments

These are the cleanest self-host story.

Officially support:

  • Coolify
  • Dokploy

These are real solutions when the box has public ingress because they handle:

  • deployment
  • container management
  • reverse proxy
  • TLS
  • domain routing

Private homelab / NATed deployments

Coolify and Dokploy do not solve NAT traversal by themselves.

If the server lives on a home LAN, we should officially recommend:

  • Tailscale
  • Cloudflare Tunnel

These solve the actual hard problem:

  • routable connectivity
  • valid TLS
  • mobile-safe access
  • avoiding manual port-forward hell

Reverse proxy setups

Support as best-effort only:

  • Caddy
  • Nginx
  • Traefik

These are fine for advanced users, but we should not let “debug my proxy” become the support burden for the project.

Official Support Matrix

Supported:

  • Cloudflare Worker deployment
  • Public VM + Docker + Coolify
  • Public VM + Docker + Dokploy
  • Private homelab + Docker + Tailscale
  • Private homelab + Docker + Cloudflare Tunnel

Best-effort:

  • Custom reverse proxy setups
  • Manual TLS setups
  • Port-forwarded home router setups without overlay/tunnel tools

Not supported in v1:

  • multi-replica deployments
  • load-balanced stateless replicas
  • distributed room ownership
  • arbitrary Kubernetes installs

Storage Model for Self-Host

Text sync

Use SQLite-backed persistence that preserves the current YAOS checkpoint+journal model.

Do not rewrite the persistence model around full-document rewrites.

Blobs

Use local disk storage with the current content-addressed model.

Suggested layout:

  • /data/yaos.db
  • /data/blobs//
  • /data/snapshots//...

Snapshots

Use the same logical model as today:

  • snapshot metadata index
  • compressed CRDT payload
  • referenced blob hashes recorded in the snapshot index

The backend may be local disk instead of R2, but the feature contract should remain the same.

Risks

1. Runtime drift

Two runtimes can diverge in behavior over time.

Mitigation:

  • one shared core
  • one shared conformance test suite
  • only thin adapters per runtime

2. Hidden Cloudflare assumptions

There are currently implicit assumptions around:

  • room ownership
  • hibernation/wakeup semantics
  • object-local storage
  • internal room-to-room fetch patterns

Mitigation:

  • surface these assumptions as explicit interfaces
  • document room lifecycle invariants clearly

3. Support burden explosion

Self-host users often turn deployment issues into app issues.

Mitigation:

  • brutally clear documentation
  • narrow support matrix
  • recommend Coolify/Dokploy/Tailscale/Tunnel first
  • avoid supporting arbitrary proxy stacks as first-class

4. Accidental multi-instance breakage

If someone runs multiple app replicas, room ownership semantics break.

Mitigation:

  • explicitly mark self-host v1 as single-instance only
  • document that scaling replicas is unsupported
  • fail loudly if possible

Rollout Plan

Phase 0: Refactor boundary only

  • extract core interfaces
  • wrap current Cloudflare runtime around them
  • introduce client-side provider abstraction
  • no product behavior change

Phase 1: Text-only self-host MVP

  • Node runtime
  • WebSocket sync
  • SQLite room persistence
  • claim/config flow
  • Docker image
  • Docker Compose example
  • no blobs/snapshots yet, or mark them unavailable

Phase 2: Blob and snapshot parity

  • local-disk blob store
  • local-disk snapshot store
  • capability reporting
  • restore flows

Phase 3: Self-host polish

  • Coolify guide
  • Dokploy guide
  • Tailscale guide
  • Cloudflare Tunnel guide
  • clearer support policy
  • operational troubleshooting docs

Acceptance Criteria

Architecture

  • Cloudflare runtime still works unchanged from a user perspective.
  • Self-host runtime serves the same public API contract.
  • Core sync invariants are shared across runtimes.

Product

  • A user can run YAOS on a public VM via Docker with TLS handled by Coolify or Dokploy.
  • A user can run YAOS on a private machine via Docker with Tailscale or Cloudflare Tunnel.
  • Mobile clients can connect successfully in supported configurations.

Testing

  • Cloudflare integration suite remains green.
  • A new self-host integration suite exists for:
    • claim flow
    • schema guard
    • sync smoke test
    • reconnect behavior
    • snapshot flows when enabled
    • blob existence/upload/download when enabled

Open Questions

  • Should text-only self-host ship before blob/snapshot parity?
  • Should the self-host runtime keep the same claim/setup UX, or use a simpler operator-provided token model first?
  • Should snapshots in self-host be stored in SQLite metadata + filesystem payloads, or fully filesystem-native?
  • Do we want one Docker image only, or also a Compose bundle with optional tunnel sidecars?
  • Should the plugin expose deployment-mode-specific copy, or keep the UI generic?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions