This document translates the PRD into an implementation-ready design for v1 (CLI-first, local-first, single-user with multiple agents).
- CLI for save, find, annotate, tag, brief, related, status, retry.
- Local SQLite store with FTS5 lexical search.
- Async ingestion pipeline with explicit states.
- Source adapters for article first, then X/YouTube/PDF.
- JSON output contract for agent integration.
- Multi-user auth and permissions.
- Hosted service and sync.
- Full web UI.
- Semantic embeddings in the first implementation.
- Language/runtime: TypeScript on Node.js 22+.
- CLI framework:
commander. - SQLite driver:
better-sqlite3. - Queue/background jobs: local SQLite-backed job table + worker loop.
- HTML/article extraction:
@mozilla/readability+jsdom. - PDF extraction: adapter abstraction (exact library selected in M2).
- Logging: structured JSON logs to stdout/stderr.
- Packaging: npm package exposing CLI binary.
- CLI command layer
- Validates input, calls application services, formats output.
- Application services
SaveService,IngestService,AnnotationService,SearchService,BriefService.
- Adapter layer
FetchAdapter,ArticleAdapter,XAdapter,YouTubeAdapter,PdfAdapter.
- Storage layer
- SQLite access with repositories and migrations.
- FTS5 index maintenance.
- Worker loop
- Pulls queued ingest tasks.
- Transitions item status and persists parse/enrichment artifacts.
saveinserts item withmetadata_savedand enqueues ingest job.- Worker fetches/normalizes text and writes chunks.
- Worker updates status to
parsed. - Optional enrichment writes summary/key claims and sets
enriched. findandbriefquery FTS + relational filters and return ranked compact output.
PRAGMA journal_mode=WAL;PRAGMA synchronous=NORMAL;PRAGMA busy_timeout=5000;
CREATE TABLE items (
id TEXT PRIMARY KEY,
canonical_url TEXT NOT NULL UNIQUE,
original_url TEXT NOT NULL,
source_type TEXT NOT NULL,
title TEXT,
author TEXT,
published_at TEXT,
fetched_at TEXT,
ingest_status TEXT NOT NULL,
ingest_error TEXT,
checksum TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE TABLE content_chunks (
id TEXT PRIMARY KEY,
item_id TEXT NOT NULL REFERENCES items(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
text TEXT NOT NULL,
token_count INTEGER,
created_at TEXT NOT NULL,
UNIQUE(item_id, chunk_index)
);
CREATE TABLE annotations (
id TEXT PRIMARY KEY,
item_id TEXT NOT NULL REFERENCES items(id) ON DELETE CASCADE,
chunk_id TEXT REFERENCES content_chunks(id) ON DELETE SET NULL,
type TEXT NOT NULL,
text TEXT NOT NULL,
actor TEXT NOT NULL,
confidence REAL,
pinned INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL
);
CREATE TABLE tags (
id TEXT PRIMARY KEY,
item_id TEXT NOT NULL REFERENCES items(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
actor TEXT NOT NULL,
created_at TEXT NOT NULL,
UNIQUE(item_id, tag, actor)
);
CREATE TABLE artifacts (
id TEXT PRIMARY KEY,
item_id TEXT NOT NULL REFERENCES items(id) ON DELETE CASCADE,
summary TEXT,
key_claims_json TEXT,
created_by TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE TABLE ingest_jobs (
id TEXT PRIMARY KEY,
item_id TEXT NOT NULL REFERENCES items(id) ON DELETE CASCADE,
status TEXT NOT NULL,
attempts INTEGER NOT NULL DEFAULT 0,
last_error TEXT,
scheduled_at TEXT NOT NULL,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);CREATE INDEX idx_items_source_fetched ON items(source_type, fetched_at);
CREATE INDEX idx_items_status ON items(ingest_status);
CREATE INDEX idx_annotations_item_type_created ON annotations(item_id, type, created_at);
CREATE INDEX idx_tags_tag_item ON tags(tag, item_id);
CREATE INDEX idx_chunks_item_index ON content_chunks(item_id, chunk_index);CREATE VIRTUAL TABLE search_fts USING fts5(
item_id UNINDEXED,
title,
chunk_text,
annotation_text,
tokenize='porter unicode61'
);- Rebuild strategy: trigger-based updates for M1, fallback periodic reconcile command for recovery.
- Ranking approach:
score = bm25(search_fts, 2.5, 1.0, 2.0)- Apply boosts/penalties:
- boost if annotation pinned
- penalty for agent confidence < 0.6
linkledger save <url> [--note] [--tags] [--json]linkledger annotate <item-id> --highlight|--lowlight|--note <text> [--actor] [--confidence] [--json]linkledger tag <item-id> --add <tags> [--actor] [--json]linkledger find <query> [--tags] [--type] [--since] [--limit] [--json]linkledger brief <query> [--max-items] [--expand-chunks] [--json]linkledger related <item-id> [--max-items] [--json]linkledger status <item-id> [--json]linkledger retry <item-id> [--json]
Common envelope:
{
"ok": true,
"data": {},
"meta": {
"timestamp": "2026-02-24T17:00:00Z",
"version": "0.1.0"
}
}Error envelope:
{
"ok": false,
"error": {
"code": "ITEM_NOT_FOUND",
"message": "No item found for id abc123",
"retryable": false
}
}Allowed transitions:
metadata_saved -> parsedmetadata_saved -> failedparsed -> enrichedparsed -> failedfailed -> metadata_saved(via retry)
Rules:
saveis idempotent on canonical URL.retryincrements attempt counter and captures last error.- max attempts default 3 before terminal
failed.
interface SourceAdapter {
supports(url: string): boolean;
detectType(url: string): 'article' | 'x' | 'youtube' | 'pdf' | 'unknown';
fetchAndParse(input: { url: string }): Promise<{
metadata: { title?: string; author?: string; publishedAt?: string };
chunks: Array<{ text: string; tokenCount?: number }>;
checksum?: string;
}>;
}Design notes:
- Adapters are pure and independently testable with fixtures.
- All adapters must return normalized UTF-8 text and deterministic chunk ordering.
- Use a single writer worker process by default.
- Reads can run concurrently.
- Handle
SQLITE_BUSYwith bounded exponential backoff. - Commands must return deterministic error codes.
- Background worker restart-safe via persisted ingest job rows.
- Structured logs include
command,item_id,duration_ms,result. - Track counters:
ingest_success_totalingest_failure_totalfind_latency_msbrief_latency_ms
- Add
linkledger doctorlater if operational complexity grows.
savep50 under 3 seconds for metadata path.findp95 under 250ms at 10k items.briefp95 under 1.5 seconds with max 20 candidates.
Performance method:
- Seed dataset generator for 1k, 5k, 10k items.
- Benchmark script run in CI nightly and locally pre-release.
- Local-only by default; no automatic outbound sync.
- Redact secrets from logs.
- Respect robots/terms/rate limits where applicable per adapter.
- Do not execute remote scripts/content.
- All M0/M1 acceptance criteria pass.
- Article adapter stable.
- Search and ranking meet latency goals on 10k-item benchmark.
- JSON contract documented and stable for agent integration.