Skip to content

fix: YouTube re-ingest should refresh metadata for duplicate sources #109

@krisoye

Description

@krisoye

Context

PR #108 added early duplicate detection to /ingest_youtube (#107). When a video URL is re-submitted, the endpoint now returns status: "duplicate" immediately without calling yt-dlp.

Problem

The old behavior intentionally refreshed metadata (view counts, publish date, engagement rate) when re-ingesting an existing video (see kb_server.py around the update_metadata call). The new early-exit bypasses this path entirely.

A user who re-submits a URL to refresh stale metadata will now get status="duplicate" with no update.

Fix Options

  1. Preserve early-exit + add refresh flag: Accept refresh=true query param to force the full pipeline for known duplicates
  2. Split the concern: Keep fast duplicate detection as default, add a separate PATCH /source/{id}/refresh endpoint
  3. Remove early-exit: Go back through the full yt-dlp pipeline for duplicates (slower but preserves old behavior)

Acceptance Criteria

  • Re-ingesting an existing YouTube URL can refresh metadata when desired
  • Default duplicate detection remains fast (no yt-dlp call)
  • status: "duplicate" still returned for known videos

Part of krisoye/admin-dashboard#72 (integration audit)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions