EPIC: Parent/chunk data model for YouTube videos (Option C)

## Problem

The KB stores YouTube video transcript chunks as separate ChromaDB documents, each with its own source ID (e.g., `youtube_IDRbItj4RGg_chunk_0` through `_chunk_40`). A single 63-minute video becomes 41 separate \"sources.\" This inflates stats (1,067 youtube_video \"sources\" are really ~50-70 unique videos) and confuses the admin-dashboard (which shows 41 rows per video).

## Solution: Option C — Parent/Chunk Data Model

Introduce a **parent source document** per video (metadata only, no embedding), link chunks via `related_source_ids` (like books already do with chapters), and update stats/listing/display to count parents not chunks.

### Existing Pattern to Follow

`BookExtractor` already implements parent + chapter via `extract_multi()` — one parent `ExtractionResult` plus per-chapter results linked through `related_source_ids`. YouTube should follow this same pattern.

## Sub-Issues

- [ ] #79 — KB Core: Add parent source document support to VectorKB (`src/vector_kb.py`)
- [ ] #80 — KB API: YouTube ingestion creates parent + linked chunks (`src/kb_server.py`)
- [ ] #81 — KB API: Stats and list_sources respect parent/chunk model (`src/vector_kb.py`, `src/kb_server.py`)
- [ ] #82 — Data migration: Create parent documents for existing YouTube chunks (`src/migrate_youtube_parents.py`)
- [ ] #83 — MCP: Update tools for parent/chunk model (`src/kb_mcp_server.py`)
- [ ] krisoye/admin-dashboard#19 — Admin Dashboard: Group chunks under parent videos

## Suggested Implementation Order

1. #79 (VectorKB core changes — foundation for everything else)
2. #80 (YouTube ingestion pipeline)
3. #81 (Stats and list_sources — enables accurate counts)
4. #83 (MCP tool updates — expose new API surface to Claude)
5. #82 (Migration script — run against production after #79-#81 are deployed)
6. krisoye/admin-dashboard#19 (Dashboard — run after KB API is deployed and migration complete)

## Key Code Locations

- `src/vector_kb.py` — ChromaDB interface, `add_source()` (~line 277), `get_stats()` (~line 994)
- `src/kb_server.py` — YouTube ingest pipeline (lines 881-1180), chunk loop (lines 997-1064)
- `src/transcript_chunking.py` — 2-min window chunking (lines 21-126)
- `src/kb_mcp_server.py` — MCP tool wrappers
- `admin-dashboard/src/backends/knowledge_bank.py` — KB API client
- `admin-dashboard/src/templates/kb/sources.html` — list page
- `admin-dashboard/src/templates/kb/partials/source_row.html` — row renderer

## Acceptance Criteria

- [ ] A 63-minute YouTube video produces 1 parent document + N chunk documents in ChromaDB
- [ ] `GET /stats` reports ~50-70 `youtube_video` sources (not 1,067)
- [ ] `POST /list_sources` returns parent documents by default (not chunks)
- [ ] Admin dashboard shows one row per video with a chunk count badge
- [ ] Migration script is idempotent and handles existing data
- [ ] All existing tests pass; new tests cover parent/chunk round-trip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Parent/chunk data model for YouTube videos (Option C) #78

Problem

Solution: Option C — Parent/Chunk Data Model

Existing Pattern to Follow

Sub-Issues

Suggested Implementation Order

Key Code Locations

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

EPIC: Parent/chunk data model for YouTube videos (Option C) #78

Description

Problem

Solution: Option C — Parent/Chunk Data Model

Existing Pattern to Follow

Sub-Issues

Suggested Implementation Order

Key Code Locations

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions