Skip to content

Add canonical recipe deduplication for URL imports#55

Merged
windoze95 merged 3 commits intomainfrom
feat/canonical-recipe-dedup
Mar 8, 2026
Merged

Add canonical recipe deduplication for URL imports#55
windoze95 merged 3 commits intomainfrom
feat/canonical-recipe-dedup

Conversation

@windoze95
Copy link
Owner

Summary

  • Adds a CanonicalRecipe model to cache extracted recipes by normalized URL, preventing redundant LLM extraction calls when multiple users import the same recipe
  • Implements URL normalization (stripping tracking params, fragments, standardizing scheme/host) to maximize cache hits
  • Adds ImportFromCanonical endpoint so the client can import directly from a cached canonical recipe
  • Includes vector similarity (HNSW) indexing on canonical recipe embeddings
  • Adds comprehensive tests for URL normalization, canonical service, import handler, and import service

Test plan

  • Verify importing a recipe by URL caches a canonical record
  • Verify importing the same URL again hits the canonical cache instead of re-extracting
  • Verify POST /v1/recipes/import/canonical creates a user recipe from a canonical record
  • Verify URL normalization strips tracking params and normalizes correctly
  • Run existing and new unit tests (go test ./...)

🤖 Generated with Claude Code

Introduce a CanonicalRecipe model to cache extracted recipes by normalized URL,
avoiding redundant LLM calls when multiple users import the same recipe. Includes
URL normalization, vector similarity indexing, and an ImportFromCanonical endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ea3d863b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

windoze95 and others added 2 commits March 7, 2026 18:20
ImportFromURL was returning canonical cache hits without checking
FetchedAt against canonicalTTL, unlike PreviewFromURL which enforced
the 7-day freshness window. This could serve stale recipe data
indefinitely until the 90-day cleanup removed the entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove DeleteStale and the cleanup goroutine — canonical entries are never
deleted. Replace GetHotEntries with GetStaleEntries so the background task
refreshes all stale canonicals (not just popular ones), keeping recipe data
current when source URLs change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@windoze95 windoze95 merged commit a06943f into main Mar 8, 2026
1 check passed
@windoze95 windoze95 deleted the feat/canonical-recipe-dedup branch March 8, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant