A single-user, self-hosted RSS reader that:
- runs on Synology via Docker
- is accessed over Tailscale (PWA)
- auto-sorts feeds into auto folders (simple categories, not manual folderizing)
- dedups “same story across outlets” into one story card
- supports mute-with-breakout filtering (e.g., “hide Roblox unless it’s a major incident”)
- generates AI summaries + digests and learns ranking from your behavior
- Replace Feedly for daily reading.
- Minimal mental overhead: Folders are automatic and simple.
- Collapse duplicates across outlets into one view.
- Fast catch-up: digest mode when you’re away or behind.
- Preference learning: sort what you’ll likely care about higher.
- Maintain good UX: always show headline + hero image + source.
- Multi-user accounts
- Push notifications
- Full offline archive of every article body
- Perfect extraction for all paywalled sites
- Host: Synology NAS
- Runtime: Docker Compose
- Network: Tailscale (private access)
- Single user (you), single device priority (iPhone), but works on desktop browser too
- Frontend: Next.js PWA (TypeScript)
- Backend API: Node.js + Fastify (TypeScript)
- Worker: Node.js (TypeScript)
- Queue/jobs: Postgres-backed jobs via
pg-bossin MVP1 (no Redis required) - Database: Postgres latest stable major at deployment time
- Vector extension: no
pgvectorin MVP1 - AI default provider: OpenAI
Rationale:
- Use one language across web, API, and worker to reduce maintenance overhead.
- Keep a single repo with clearly separated services and shared contracts/types.
Suggested repository structure:
apps/web(Next.js PWA, TypeScript)apps/api(Fastify HTTP API, TypeScript)apps/worker(polling, extraction, clustering, digests; TypeScript)packages/contracts(shared schemas/types and generated API client)infra(docker-compose, env templates, deployment scripts)db(migrations, seed data)
Add a Python sidecar only if TypeScript implementation misses quality targets for two consecutive weeks after tuning:
- extraction success rate for priority sources < 90%
- cluster correction rate (manual split requests) > 12%
- worker CPU saturation causes p95 ingest latency to exceed configured target
- Nightly Postgres backup (
pg_dumpcustom format) - Retention: 7 daily + 4 weekly snapshots
- Store backups on NAS volume with optional encrypted off-device sync
- Run a restore verification at least monthly
- Home (All Stories)
- Folders (Auto folders tabs/list)
- Digest
- Saved
- Sources (manage feeds)
- Settings
Infinite scroll list of Story Cards (clusters)
Sort options:
- For You (default: personal score with recency floor)
- Latest (strict reverse chronological)
Card fields:
- headline
- hero image
- primary source name + time
- “+N outlets” (cluster size)
- folder label
- optional AI “1–2 sentence summary”
- optional badge: “Muted topic breakout” (with reason)
Card actions:
- Open (cluster detail)
- Save
- Mark read
- “Not interested”
- “Mute keyword…” (creates a mute rule)
- “Prefer this source” (source weight +)
- “Mute this source”
Header: headline + hero + primary source
Sections:
- AI “Story so far” summary (optional)
- Outlets list (members): each opens the article view
Actions:
- Save cluster
- Mark read
- Split cluster (escape hatch)
- Mute keyword/topic extracted from title (quick creation)
- Render via reader mode (extracted text) when available
- Fallback: embedded page view
- If embed is blocked by site CSP/X-Frame headers, show a clear "Open original" action
Capture analytics:
- time on article view
- scroll depth
- quick bounce (<10–15s)
Trigger banner on Home when conditions met (“You were away… View digest”)
Digest sections:
- Top picks for you
- Big stories (most outlets / high-rep sources)
- Quick scan (one-liners)
Tap entry → cluster detail
- List of saved clusters
- Sort by saved date; optional folder filter
List feeds with:
- assigned folder (single)
- “trial” flag (optional)
- weight slider (Prefer / Neutral / Deprioritize)
Actions:
- Add feed URL
- OPML import
On add: prompt “I categorized this as Gaming. Change?”
AI mode:
- Off
- Summaries + digest
- Full (summaries + auto foldering assist + smart ranking)
Digest triggers (defaults):
- Away ≥ 24h OR backlog ≥ 50 clusters
Retention: see section 10
Filters: manage mute rules
Provider selection: OpenAI / Claude / Local
AI budget cap:
- Monthly cap is configurable (default $20)
- On cap hit: fallback to local model only when local provider is configured
- If local provider is not configured, fallback option is not selectable and hosted AI is paused until reset
Feed polling:
- Poll interval is configurable (default 60 minutes)
Default folders (editable but keep small):
- Tech
- Gaming
- Security
- Business
- Politics
- Sports
- Design
- Local
- World
- Other
Inputs:
- feed title/description
- site title/description (if available)
- sample of last 10 titles
Process:
- Rules/keywords classifier (fast)
- If AI enabled, LLM classifier as tie-breaker
- Prompt user with suggestion + dropdown override
Store:
- feed.folder_id
- feed.folder_confidence
Weekly job samples last 30 items:
- if >35% classify to a different folder → prompt:
- “This feed looks more like Tech lately. Move it?”
Actions: Keep / Move / Create folder (rare)
Note: This is the only “ask me if shifts” behavior; no constant reorg.
A cluster represents one story covered by multiple outlets.
- canonical_url
- title
- summary/excerpt
- extracted snippet (if available)
- published timestamp
- folder inherited from feed (site-first)
Time-windowed near-duplicate matching (48h window):
Candidate selection: items within 48h, same language
Similarity score:
- title simhash distance
- token Jaccard overlap
- optional embedding cosine (if AI mode full)
Decision:
- if score ≥ threshold → join cluster
- else create new cluster
Representative item selection:
- highest source weight
- else most complete extracted text
- else earliest
To keep UI simple:
- cluster folder = representative item’s feed folder
“Split cluster”:
- creates a new cluster and moves selected members
- logs a correction event (can tune thresholds later)
- Mute (default): hide matches unless breakout triggers
- Hard block: never show (rare)
Pre-filter on title + feed summary
Post-cluster filter on representative’s title + summary + extracted snippet
This prevents leaks from other outlets.
Muted matches are soft-hidden before clustering, not dropped. They still participate in clustering and breakout checks.
If a mute rule matches, allow through when any of:
- Severity keywords appear (e.g., hack/breach/0day/arrest/DOJ/CISA/state-backed/outage/porn)
- Source is “high reputation” list (user-configurable)
- Cluster size ≥ N outlets within 24h (default N=4)
Cluster size for breakout includes outlets that are muted/hidden by the same rule.
If allowed through:
- badge story as “Muted topic breakout” + reason
Rule: keyword="roblox", mode=mute
Normal Roblox content hidden
“Roblox hacked…” passes due to severity keyword + cluster size
Define interface:
- embed(texts[]) -> vectors[]
- summarize(text, style) -> summary
- classify(text, labels[]) -> label/confidence
Implement providers:
- OpenAI
- Anthropic (Claude)
- Local (Ollama / llama.cpp)
Routing:
- embeddings: local or cheapest
- summaries/digest: hosted by default
- classification for folders: hosted only when uncertain
Config:
- AI_PROVIDER=openai|anthropic|local
- optional per-task overrides
Generate per cluster:
- 1–2 sentence “card summary”
- longer “story so far” in cluster detail
Cache and regenerate only when cluster materially changes.
Sensitive handling:
- if headline indicates sensitive content, generate a short sanitized summary or skip.
Goal: order clusters by “you’ll likely care”.
Signals:
- recency decay
- folder affinity (learned)
- source weight
- engagement history (opens, dwell, scroll, saves, not interested)
- diversity penalty (avoid same folder/source repeating)
Start with heuristic scoring; later upgrade to a lightweight learned model.
Ranking guardrails:
- Add exploration quota to avoid permanent starvation of low-ranked stories
- Always provide a user-visible sort toggle (
For YouandLatest)
- Suggest-only initially (no auto-add)
- Based on folders you read + sources you prefer
- Trial feeds: add 1–3 per week if enabled, easy remove
- Promote/demote based on engagement
- Unread
- Read (hidden)
- Saved (persist)
- Keep Unread until read or older than max-age (optional default: no max)
- When marked Read:
- hide from UI immediately
- keep lightweight record for ranking + dedup memory
- purge extracted text after N days (default 14–30) to save space
- Saved:
- keep indefinitely
- keep metadata + canonical link only (no guaranteed full text retention)
- RSS media fields (media:content, media:thumbnail, enclosure)
- Article HTML meta (og:image, twitter:image)
- Fallback: first large image in extracted content
Store:
- hero_image_url
- optionally hero_image_cached_path (download/cache)
- Use representative item’s hero; fallback to first available among members.
Core
- folder(id, name)
- feed(id, url, title, site_url, folder_id, folder_confidence, weight, muted, created_at, last_polled_at, etag, last_modified)
- item(id, feed_id, url, canonical_url, title, summary, published_at, author, guid, hero_image_url, extracted_text, extracted_at)
- cluster(id, rep_item_id, folder_id, created_at, updated_at, size)
- cluster_member(cluster_id, item_id, added_at)
- read_state(cluster_id, read_at, saved_at)
Auth
- user_account(id, username, password_hash, created_at, last_login_at)
- auth_session(id, user_id, refresh_token_hash, created_at, expires_at, last_seen_at, revoked_at)
Filtering
- filter_rule(id, pattern, type=phrase|regex, mode=mute|block, breakout_enabled, created_at)
- filter_event(rule_id, cluster_id, action=hidden|breakout_shown, ts)
Analytics
- event(id, ts, type, payload_json) (batched from PWA)
Digests
- digest(id, created_at, start_ts, end_ts, title, body, entries_json)
feed.url_normalizeduniqueitemunique on (feed_id,guid) when guid exists- fallback uniqueness for guid-less entries: (
feed_id,canonical_url,published_at) cluster_memberunique on (cluster_id,item_id)read_state.cluster_idis primary keyeventaccepts clientidempotency_keyto dedupe retries
- Poll feeds (conditional GET)
- Parse items → upsert
- Canonicalize URL
- Pre-filter soft gate (mute/block) using title+summary
- Selective extraction (policy-based)
- Compute features (simhash; embeddings if enabled)
- Cluster assignment
- Post-cluster filter (mute-with-breakout)
- Summary generation (optional)
- Digest generation (if triggers met)
- Retries with exponential backoff for poll/extract/AI stages
- Stage timeouts and per-feed circuit breaker
- Dead-letter queue/table for repeated failures
- Structured error logging with feed/item identifiers
Extract if any:
- summary missing or < N chars (e.g., 280)
- title matches generic patterns (“briefing”, “top stories”, “update”)
- item becomes cluster representative
- source weight is high
Default triggers:
- Away ≥ 24h OR backlog ≥ 50 unread clusters
Manual “Generate digest now”
Away is defined from last_active_at, updated on app foreground and interaction events.
Digest generation:
- rank clusters
- produce multi-section digest
- cache for the session/day
- GET /v1/clusters?folder_id=&cursor=&limit=&state=unread|saved|all&sort=personal|latest
- GET /v1/clusters/{id}
- POST /v1/clusters/{id}/read
- POST /v1/clusters/{id}/save
- POST /v1/clusters/{id}/split
- POST /v1/clusters/{id}/feedback (not_interested, split_request)
- GET /v1/folders
- GET /v1/feeds
- POST /v1/feeds (add)
- PATCH /v1/feeds/{id} (folder, weight, muted, trial)
- POST /v1/opml/import
- GET /v1/filters
- POST /v1/filters
- PATCH /v1/filters/{id}
- DELETE /v1/filters/{id}
- GET /v1/digests
- POST /v1/events (batch)
- GET /v1/settings
- POST /v1/settings
- POST /v1/auth/login
- POST /v1/auth/logout
- POST /v1/auth/refresh
Auth:
- single-user login (password) with access + refresh tokens
- web (Next.js PWA)
- api (Node.js Fastify)
- worker (Node.js polling + processing)
- postgres (recommended)
- optional redis (future caching/rate-limit use, not required for MVP1)
MVP1: Replacement
- OPML import + add feed
- Auto folder assignment prompt
- Story clustering
- Home + Cluster detail + Saved
- Login screen + single-user auth session
- Sort toggle: For You / Latest
- Hero images
- Read/hide behavior
- Pre/post filters (mute-with-breakout)
MVP2: Catch-up
- Digest view + triggers
- AI summaries (provider switchable)
MVP3: Your algorithm
- Preference learning ranking improvements
- Recommendations (trial feeds)
- Auto folders = site-first, one folder per feed, minimal UI concepts
- Dedup = cluster stories across outlets
- Roblox-like filtering = mute-with-breakout
- Hero image + headline always captured and stored
- Retention = unread persists; read hidden but lightweight history kept; saved persists
- Chosen stack direction: TypeScript-only (Next.js + Fastify + Node worker)
- Queue recommendation accepted: Postgres-backed jobs first (
pg-boss) - DB recommendation accepted: latest stable Postgres major
pgvectordeferred- OpenAI is default provider
- AI monthly budget cap starts at
$20and is configurable - Auth model is login screen (single user)
- Ranking default is personal score with sort fallback to latest
- Muted stories still count for breakout conditions
- Polling interval is configurable; default 60 minutes with conditional GET and backoff
- Saved entries retain metadata + canonical link only
- Python is deferred unless quality gates in section 3.2 fail
If you want next, I can output:
- the exact filter rule JSON schema + starter severity keyword list,
- a Postgres schema (DDL),
- and a Synology-friendly docker-compose.yml skeleton.