-
Notifications
You must be signed in to change notification settings - Fork 0
feat: support scheduled ingestion for all source types (github, reddit, youtube, sessions) #46
Copy link
Copy link
Open
Description
Problem
refresh schedule currently only works with url/urls (web pages). There's no way to schedule periodic re-ingestion for GitHub repos, Reddit feeds, YouTube channels, or session exports.
Current Behavior
// ingest has no schedule subaction
{ "action": "ingest", "subaction": "start", "source_type": "github", "target": "owner/repo" }
// refresh schedule only accepts url/urls
{ "action": "refresh", "subaction": "schedule", "url": "https://..." }
// unknown field `source_type` error if you try to pass itDesired Behavior
Scheduling should work for any ingestion source type:
{ "action": "ingest", "subaction": "schedule", "source_type": "github", "target": "owner/repo", "interval": "24h" }
{ "action": "ingest", "subaction": "schedule", "source_type": "reddit", "target": "rust", "interval": "6h" }
{ "action": "ingest", "subaction": "schedule", "source_type": "youtube", "target": "https://youtube.com/...", "interval": "12h" }Or alternatively, extend refresh schedule to accept source_type + target in addition to url:
{ "action": "refresh", "subaction": "schedule", "schedule_subaction": "create", "source_type": "github", "target": "owner/repo", "interval": "24h" }Use Case
Keep a GitHub repo's indexed content fresh without manual re-ingestion. For example, a project repo that's actively developed should be re-indexed daily so queries reflect the latest code.
Suggested Approach
- Add
schedulesubaction toingest(mirrors the existingrefresh schedulepattern) - Or generalize the
refresh schedulestorage model to store{ source_type, target }pairs in addition to URL lists - Scheduled jobs should appear in
ingest list/refresh schedule listwith their next-run time
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels