Working codename (temporary): Pocket for Agents
Planned product name: linkledger-cli
Type: CLI-first personal knowledge capture and retrieval system for human+agent workflows
One-liner: Save links fast, extract reusable highlights/notes, and return compact evidence packs for agents with minimal token overhead.
Pocket for Agents is a temporary ideation codename only and must not be used as a shipping name due to potential trademark infringement risk.
Current workflow:
- User shares links and initial thoughts with an agent.
- Agent creates outlines and drafts for blog, LinkedIn, X, and Bluesky.
- Agent opens a PR with a draft blog post.
Current gap:
- Useful references, highlights, and rationale are not consistently persisted.
- Agents repeatedly re-read full sources across tasks.
- Research context is fragmented across chats/tools.
Opportunity:
- Build a simpler, reliable "save + annotate + retrieve" system that acts as long-lived memory for agents.
- Keep v0 CLI-first to maximize speed of iteration and agent ergonomics.
Knowledge artifacts (links, highlights, commentary, confidence, source quality) are not stored in a way that agents can efficiently reuse later. This causes:
- Lost context between tasks.
- Lower drafting quality due to weak evidence reuse.
- Unnecessary token spend from repeated long-form reading.
- Poor traceability for why a source was used.
- Fast capture: Save a URL + optional context in seconds.
- Durable memory: Persist normalized content, tags, notes, highlights/lowlights.
- Agent-native retrieval: Return compact, high-signal context for drafting/research tasks.
- Source breadth: Support articles, X links, YouTube, and PDFs.
- Search-first foundation: Design for fast and capable retrieval as corpus size grows.
- Full consumer UI with heavy interaction patterns.
- Team collaboration, permissions, billing, or SaaS multi-tenancy.
- Replacing existing content publishing pipeline.
- Perfect archival fidelity of every source format.
- Single human operator (you) and your internal agents/sub-task agents.
- "When I find a useful source, I can save it immediately with minimal friction."
- "When drafting, agents can find prior high-value evidence and highlights quickly."
- "When a claim is made, I can trace source and annotation provenance."
- CLI first, API friendly - optimize for agent calls and low bandwidth.
- Highlights over full text - retrieval should prefer high-signal snippets first.
- Provenance always - every annotation stores actor/time/confidence.
- Search is core infrastructure - index strategy is part of v0 architecture, not a later patch.
- Deterministic and debuggable - canonical IDs, dedupe rules, explicit failure states.
- Save URL with optional note and tags.
- Canonicalize URL and detect duplicates.
- Fetch and parse source content into normalized text chunks.
- Capture metadata (title, author, publication date, content type).
- Mark ingestion status with explicit stages:
metadata_savedparsedenrichedfailed(with error reason)
- Provide operational commands for ingest visibility and recovery:
linkledger status <item-id>linkledger retry <item-id>
- Support
highlight,lowlight, andnoteannotations. - Annotation actor types:
human,agent:<name>. - Store confidence for agent-generated annotations (0.0-1.0).
- Allow tags from both human and agent with actor provenance.
- Query by keyword, tags, source type, date range, actor.
- Return ranked results with highlight-first snippets.
- Provide compact "brief" output for agent consumption:
- canonical URL
- source metadata
- top highlights/lowlights
- ranking reason metadata (
why_ranked) - optional summary/key claims
- Default retrieval contract is compact/high-signal only; full chunk expansion requires an explicit flag.
- JSON output mode for all commands.
- Stable IDs for items/annotations to support downstream automation.
- Retry strategy for fetch/parse failures.
- Idempotent saves (same canonical URL should not create duplicates).
- Basic observability via status and ingest timestamps.
linkledger save <url> [--note "..."] [--tags tag1,tag2]linkledger annotate <item-id> --highlight "..." [--confidence 0.82] [--actor agent:researcher]linkledger annotate <item-id> --lowlight "..." [--confidence ...]linkledger tag <item-id> --add tag1,tag2 [--actor ...]linkledger find "query" [--tags ...] [--type article|x|youtube|pdf] [--since YYYY-MM-DD]linkledger brief "topic or task" [--max-items N] [--json] [--expand-chunks]linkledger related <item-id> [--max-items N]linkledger status <item-id>linkledger retry <item-id>
itemsid,canonical_url,original_url,source_type,title,author,published_at,fetched_at,ingest_status,checksum
content_chunksid,item_id,chunk_index,text,token_count
annotationsid,item_id,chunk_id (nullable),type (highlight|lowlight|note),text,actor,confidence,created_at
tagsid,item_id,tag,actor,created_at
artifactsid,item_id,summary,key_claims_json,created_by,created_at
items(canonical_url)uniqueitems(source_type, fetched_at)tags(tag, item_id)annotations(item_id, type, created_at)content_chunks(item_id, chunk_index)
As corpus grows, retrieval quality and latency become product-critical. v0 will include explicit search foundations:
- SQLite + FTS5 virtual table for full-text search over:
- item titles
- normalized chunk text
- annotation text
- BM25 ranking with weighted fields (title/highlight text weighted above body text).
- Tag filtering and source/date filters applied at query time.
- Snippet generation returns compact context windows.
- Retrieval ranking down-weights low-confidence agent annotations.
- Hybrid retrieval:
- lexical (FTS/BM25) + semantic embedding similarity.
- Re-ranking for "brief" output quality.
- Cached topic-level memory packs for recurring tasks.
savep50 < 3s for metadata path; full parse async.findp95 < 250ms at 10k items on local machine.briefp95 < 1.5s with max 20 candidate items.
- Local-first: SQLite database in local workspace/app data.
- CLI package: main interaction surface for human and agents.
- Ingestion adapters: pluggable parsers per source type (article/x/youtube/pdf).
- Async ingestion pipeline: save event creates record immediately, parsing/refinement can complete in background.
- Stateless command execution: commands return deterministic output, with optional JSON for machine clients.
- Concurrency defaults: SQLite WAL mode + bounded retry/backoff for write-lock contention.
- Every annotation/tag/artifact stores actor and timestamp.
- Agent annotations require confidence value (defaulted if omitted).
- Agent annotation generation is controlled:
- max 3-7 highlights per item (configurable)
- human pin/unpin override for highlights
- Duplicate/near-duplicate detection via canonical URL + checksum.
- Source parsing failures are explicit and queryable.
- Staleness policy: items older than 30 days trigger lightweight revalidation on access.
POC considered successful if, over 2-3 weeks:
-
=30% of new drafts reuse at least one stored source or highlight.
savemedian time <=10 seconds end-to-end.- At least 70% of "brief" responses judged useful by human review.
- Measurable reduction in repeated full-source reads by agents.
- Parsing reliability: X/YouTube/PDF extraction quality can vary.
- Noise risk: Over-generated agent highlights may reduce signal.
- Ranking quality ceiling: lexical-only may underperform for fuzzy topic recall before hybrid retrieval.
- Governance: how to deprecate stale/low-quality annotations over time.
- Naming/legal: selecting a non-infringing final product name before implementation.
- SQLite schema, canonical save, dedupe, basic fetch metadata.
- CLI:
save,find(metadata only),--jsonsupport.
- Annotation commands.
- FTS5 index and weighted lexical ranking.
- Highlights-first snippets in
find.
- Add adapters for X, YouTube, PDFs.
- Improved parse robustness + retries.
briefcommand outputs compact evidence packs.- Integration into existing content drafting agent workflow.
- Single-user first: yes (you + your agents).
- Storage model: local-first SQLite.
- Ingestion storage: canonical metadata + normalized text chunks; optional raw fetch artifact for debugging only.
- Annotation model: human + agent annotations with provenance/confidence, plus capped agent highlights per item.
- Retrieval model: BM25 lexical/tag first with weighted fields; hybrid semantic later.
- Freshness policy: lightweight revalidation on access after 30 days.
- Integration interface: CLI-first with stable JSON contract (no local HTTP service in v1).
- Differentiator: agent-native CLI with low token and bandwidth overhead.
UI can be introduced after retrieval quality is stable, focused on:
- Reviewing/editing highlights.
- Curating high-signal evidence sets.
- Human override of agent tags/confidence.
- Lightweight browse/search experience for manual curation.
Version: v0 draft
Status: Ready for iteration