-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Summary
This RFC proposes adding a first-class self-hosted server runtime for YAOS, packaged for Docker, while preserving the current
Cloudflare Worker deployment as the default zero-ops path.
The key architectural decision is:
- Keep Cloudflare as the canonical zero-terminal deployment.
- Add a separate Node-based self-host runtime for Docker/public VM/homelab users.
- Do not treat y-partyserver as the universal server abstraction.
- Instead, define YAOS-owned core abstractions for room lifecycle, persistence, blobs, snapshots, and auth, then implement:
- a Cloudflare adapter using y-partyserver + Durable Objects
- a Node adapter using WebSocket + SQLite + local disk
This gives us vendor portability without destabilizing the current Cloudflare path.
Motivation
YAOS already solves the main problem for most users:
- zero-ops deployment
- real-time sync
- no full database stack
- good mobile story through Cloudflare-managed TLS and routing
However, there is a meaningful self-hosted segment that specifically wants:
- a server they physically own
- air-gapped or private-network deployments
- a Docker image they can run on a VM, NAS, or homelab box
- a path that does not depend on Cloudflare as the operator
This is not just a deployment preference. For part of the Obsidian self-host community, “self-deployed on Cloudflare” and “self-hosted
on my own machine” are not equivalent.
Problem Statement
The current server is portable in some places, but not portable as a whole.
Portable pieces already exist:
- the plugin mostly speaks a generic HTTP/WebSocket contract
- the CRDT/persistence model is YAOS-owned
- the chunked checkpoint+journal store is already abstracted over a storage interface
Cloudflare-specific pieces are still deeply embedded:
- room addressing and room ownership are Durable Object-shaped
- y-partyserver is built on Durable Objects
- claim/config storage uses a singleton DO
- room debug tracing is stored in DO storage
- snapshots/blobs use R2 directly
- the setup/docs/UI are explicitly Cloudflare-first
So this is not “put the current server in Docker.”
It is “build a second runtime for the same YAOS server contract.”
Current Architecture Reality
Relevant grounding in the repo:
- Main Worker routing and auth: server/src/index.ts
- Per-vault room server on YServer: server/src/server.ts
- Chunked checkpoint+journal persistence: server/src/chunkedDocStore.ts
- Claim/config singleton storage: server/src/config.ts
- Snapshots and blob key model: server/src/snapshot.ts
- Client provider construction: src/sync/vaultSync.ts:233
- Current Cloudflare-first settings/docs: src/settings.ts, README.md, server/README.md
Also worth noting:
- npm run test:integration:worker currently passes end-to-end for the Cloudflare runtime.
Goals
- Provide an official self-hosted deployment path for YAOS.
- Support Docker deployment on public VMs and homelabs.
- Preserve the current plugin UX and wire protocol as much as possible.
- Preserve the current Cloudflare deployment unchanged for users who want zero-ops.
- Minimize risk to the current stable path.
- Make the server architecture explicitly runtime-agnostic at the core layer.
Non-Goals
- Replacing Cloudflare as the recommended default deployment.
- Supporting multi-node/distributed self-hosting in v1.
- Building a new actor framework or replatforming to Rust/Rivet.
- Solving arbitrary custom reverse proxy setups beyond documented best-effort guidance.
- Achieving perfect implementation symmetry between Cloudflare and self-host runtimes.
Core Architectural Decision
Decision
Use a dual-runtime architecture.
Why
y-partyserver is a Durable Object abstraction, not a generic app server abstraction. It should remain the Cloudflare adapter, not
become the center of the entire portability strategy.
The portability boundary should be YAOS-owned and include:
- room registry / room ownership
- room persistence
- claim/config storage
- blob storage
- snapshot storage
- auth and schema gating
- debug/trace hooks
Consequence
We should preserve:
- the plugin’s HTTP/WS contract
- the room semantics
- the CRDT/persistence invariants
We should not preserve:
- Durable Object runtime assumptions as a cross-platform abstraction
Proposed Design
1. Introduce a core server layer
Create runtime-agnostic interfaces for:
- RoomStore
- RoomRegistry
- ConfigStore
- BlobStore
- SnapshotStore
- TraceStore
- AuthService
The core layer should own:
- token verification
- claim semantics
- schema-version checks
- room load/save policy
- checkpoint+journal persistence policy
- blob existence/upload/download semantics
- snapshot creation/list/download semantics
2. Keep Cloudflare as an adapter
The existing Worker runtime becomes runtime-cloudflare.
It keeps:
- y-partyserver
- Durable Objects
- R2
- current setup flow
- current deploy button story
This path should change only as needed to plug into core.
3. Add a Node self-host adapter
Create runtime-node using:
- Node LTS
- SQLite for room/config persistence
- local filesystem for blobs and snapshot payloads
- standard WebSocket server for /vault/sync/:vaultId
- standard HTTP routes matching the current contract
Important constraint for v1:
- single process
- single instance
- one authoritative in-memory room owner per vault
No horizontal scaling in v1.
4. Add a tiny provider abstraction in the plugin
Right now the plugin constructs YSyncProvider in one place.
We should wrap that in a small local interface so the client can swap transports later if needed.
Target state:
- Cloudflare path still uses y-partyserver/provider
- self-host path can use a compatible provider or custom adapter later
- plugin business logic stays unchanged
Networking and Deployment Strategy
The main lesson here is:
- Docker is easy
- secure mobile networking is the real product problem
Public VM deployments
These are the cleanest self-host story.
Officially support:
- Coolify
- Dokploy
These are real solutions when the box has public ingress because they handle:
- deployment
- container management
- reverse proxy
- TLS
- domain routing
Private homelab / NATed deployments
Coolify and Dokploy do not solve NAT traversal by themselves.
If the server lives on a home LAN, we should officially recommend:
- Tailscale
- Cloudflare Tunnel
These solve the actual hard problem:
- routable connectivity
- valid TLS
- mobile-safe access
- avoiding manual port-forward hell
Reverse proxy setups
Support as best-effort only:
- Caddy
- Nginx
- Traefik
These are fine for advanced users, but we should not let “debug my proxy” become the support burden for the project.
Official Support Matrix
Supported:
- Cloudflare Worker deployment
- Public VM + Docker + Coolify
- Public VM + Docker + Dokploy
- Private homelab + Docker + Tailscale
- Private homelab + Docker + Cloudflare Tunnel
Best-effort:
- Custom reverse proxy setups
- Manual TLS setups
- Port-forwarded home router setups without overlay/tunnel tools
Not supported in v1:
- multi-replica deployments
- load-balanced stateless replicas
- distributed room ownership
- arbitrary Kubernetes installs
Storage Model for Self-Host
Text sync
Use SQLite-backed persistence that preserves the current YAOS checkpoint+journal model.
Do not rewrite the persistence model around full-document rewrites.
Blobs
Use local disk storage with the current content-addressed model.
Suggested layout:
- /data/yaos.db
- /data/blobs//
- /data/snapshots//...
Snapshots
Use the same logical model as today:
- snapshot metadata index
- compressed CRDT payload
- referenced blob hashes recorded in the snapshot index
The backend may be local disk instead of R2, but the feature contract should remain the same.
Risks
1. Runtime drift
Two runtimes can diverge in behavior over time.
Mitigation:
- one shared core
- one shared conformance test suite
- only thin adapters per runtime
2. Hidden Cloudflare assumptions
There are currently implicit assumptions around:
- room ownership
- hibernation/wakeup semantics
- object-local storage
- internal room-to-room fetch patterns
Mitigation:
- surface these assumptions as explicit interfaces
- document room lifecycle invariants clearly
3. Support burden explosion
Self-host users often turn deployment issues into app issues.
Mitigation:
- brutally clear documentation
- narrow support matrix
- recommend Coolify/Dokploy/Tailscale/Tunnel first
- avoid supporting arbitrary proxy stacks as first-class
4. Accidental multi-instance breakage
If someone runs multiple app replicas, room ownership semantics break.
Mitigation:
- explicitly mark self-host v1 as single-instance only
- document that scaling replicas is unsupported
- fail loudly if possible
Rollout Plan
Phase 0: Refactor boundary only
- extract core interfaces
- wrap current Cloudflare runtime around them
- introduce client-side provider abstraction
- no product behavior change
Phase 1: Text-only self-host MVP
- Node runtime
- WebSocket sync
- SQLite room persistence
- claim/config flow
- Docker image
- Docker Compose example
- no blobs/snapshots yet, or mark them unavailable
Phase 2: Blob and snapshot parity
- local-disk blob store
- local-disk snapshot store
- capability reporting
- restore flows
Phase 3: Self-host polish
- Coolify guide
- Dokploy guide
- Tailscale guide
- Cloudflare Tunnel guide
- clearer support policy
- operational troubleshooting docs
Acceptance Criteria
Architecture
- Cloudflare runtime still works unchanged from a user perspective.
- Self-host runtime serves the same public API contract.
- Core sync invariants are shared across runtimes.
Product
- A user can run YAOS on a public VM via Docker with TLS handled by Coolify or Dokploy.
- A user can run YAOS on a private machine via Docker with Tailscale or Cloudflare Tunnel.
- Mobile clients can connect successfully in supported configurations.
Testing
- Cloudflare integration suite remains green.
- A new self-host integration suite exists for:
- claim flow
- schema guard
- sync smoke test
- reconnect behavior
- snapshot flows when enabled
- blob existence/upload/download when enabled
Open Questions
- Should text-only self-host ship before blob/snapshot parity?
- Should the self-host runtime keep the same claim/setup UX, or use a simpler operator-provided token model first?
- Should snapshots in self-host be stored in SQLite metadata + filesystem payloads, or fully filesystem-native?
- Do we want one Docker image only, or also a Compose bundle with optional tunnel sidecars?
- Should the plugin expose deployment-mode-specific copy, or keep the UI generic?