- Validate correctness of capture, annotation, retrieval, and retry flows.
- Prevent regressions in search relevance and latency.
- Ensure stable machine-readable outputs for agent integration.
- URL canonicalization and deterministic item ID generation.
- Ranking score composition and confidence penalties.
- Ingestion state transition guard logic.
- Command argument validation and error code mapping.
- CLI command to DB behavior (
save/find/annotate/tag/brief/status/retry). - Worker processing with queued ingest jobs.
- FTS indexing and filter behavior.
- Deduplication behavior across repeated saves.
- Simulate real user flow:
- save URL with note
- parse to chunks
- add highlights
- query brief for a topic
save
- New URL ->
metadata_saveditem created. - Duplicate canonical URL -> same item returned.
- Invalid URL -> deterministic validation error.
find
- Keyword match in title.
- Keyword match in annotation text.
- Combined filters (
--tags,--type,--since). - Stable sort when scores tie.
- Reddit-specific source filtering (
--type reddit) against post/comment text.
annotate
- Reject missing confidence for agent actor.
- Enforce highlight cap per item.
- Allow human pin/unpin operations.
brief
- Returns compact snippets by default.
- Includes
why_rankedand provenance fields. --expand-chunksincludes chunk text.
status/retry
statusreflects current ingest state.retryonly allowed forfailed(or explicitly supported transitional states).
-
Article fixtures:
- clean article page
- noisy page with nav/ads
-
X fixtures:
- single post
- thread style content
-
YouTube fixtures:
- metadata-only case
- transcript-available case
-
PDF fixtures:
- text-native PDF
- image-heavy/low-text PDF
-
Reddit fixtures:
- post with self text + top comments
- fallback behavior when listing API fails
- 1,000 items
- 5,000 items
- 10,000 items
savemetadata path p50findp95 latencybriefp95 latency with--max-items 20
savep50 < 3sfindp95 < 250ms at 10k itemsbriefp95 < 1.5s at 10k items
- Kill/restart worker during ingest and verify job recovery.
- Simulate transient fetch errors and verify retry/backoff.
- Simulate SQLite busy locks with concurrent command invocations.
- Snapshot tests for
--jsonoutput per command. - Versioned schema validation for response envelopes.
- Error contract tests (
code,message,retryable).
- Unit and integration tests must pass.
- Lint and type-check must pass.
- JSON contract snapshots must pass.
- Performance smoke benchmark must not regress >15% from baseline.
- Save 10 mixed sources and confirm status transitions.
- Add human and agent highlights; verify cap and confidence rules.
- Run topic
briefand verify evidence quality manually. - Re-run after 30+ day staleness simulation and verify revalidation behavior.
- Validate Reddit URL canonicalization (
redd.it,old.reddit.com) and backfill dry run.