-
Notifications
You must be signed in to change notification settings - Fork 0
feat(refresh): scheduled axon refresh for keeping indexed docs fresh #39
Description
Summary
axon refresh is fully implemented (schedules, 304 detection, SHA256 change detection, auto-embed on change, MCP integration) but it's not being used to keep our own indexed documentation fresh. Wire it up: create schedules for all major doc sites we depend on, ensure the scheduler worker runs as a proper service, and surface refresh status in the web UI.
Current State (fully implemented, just not deployed)
The refresh system has everything:
axon_refresh_jobs— job table with status/result trackingaxon_refresh_targets— per-URL ETag + Last-Modified + SHA256 stateaxon_refresh_schedules— named schedules with interval-based firing (every_seconds)- Tiers:
high(30min),medium(6h),low(24h) - 304 Not Modified support — zero re-embedding if content unchanged
- SHA256 fallback for servers without ETag/Last-Modified
axon refresh schedule worker— polling loop (30s tick, configurable viaAXON_REFRESH_SCHEDULER_TICK_SECS)- MCP tool:
{ "action": "refresh", "subaction": "schedule", "schedule_subaction": "create", ... }
What's missing:
- The scheduler worker is NOT in
docker-compose.yaml— it only runs if started manually - No pre-configured schedules for the doc sites we actually depend on
- No web UI surface for managing refresh schedules
- The
refresh-schedule-workers6 service referenced in the codebase doesn't exist yet in compose
Work Items
1. Add refresh-schedule-worker to docker-compose.yaml
axon-refresh-scheduler:
# same image as axon-workers
command: ["axon", "refresh", "schedule", "worker"]
environment:
AXON_REFRESH_SCHEDULER_TICK_SECS: "30"
depends_on: [axon-postgres, axon-rabbitmq]
restart: unless-stoppedAlso ensure axon refresh worker (job processor) is running — add as a separate service or lane in the existing workers container.
2. Bootstrap schedules for core doc sites
Create a setup script or axon doctor-triggered bootstrap that creates refresh schedules for docs we actively use:
# Rust / core language docs
axon refresh schedule add rust-std https://doc.rust-lang.org/std/ --tier low
axon refresh schedule add rust-async https://rust-lang.github.io/async-book/ --tier low
axon refresh schedule add tokio-docs https://docs.rs/tokio/ --tier medium
# Our own stack
axon refresh schedule add axon-qdrant https://qdrant.tech/documentation/ --tier medium
axon refresh schedule add rmcp-docs https://docs.rs/rmcp/ --tier medium
axon refresh schedule add axum-docs https://docs.rs/axum/ --tier medium
# AI / Agent skills
axon refresh schedule add agentskills https://agentskills.io/home --tier high
axon refresh schedule add claude-docs https://docs.anthropic.com/ --tier mediumStore these as a scripts/bootstrap-refresh-schedules.sh that's idempotent (skip if schedule already exists).
3. Seed URL + manifest integration
Currently, if a schedule has only a seed_url (no explicit urls_json), it looks for a crawl manifest at:
{output_dir}/domains/{domain}/latest/manifest.jsonl{output_dir}/domains/{domain}/sync/manifest.jsonl
Gap: Many doc sites we crawl don't leave a manifest at these paths. Document the expected manifest format and ensure crawl jobs write manifests to these locations, or provide an alternative seed mechanism (e.g., axon sources --domain docs.rs --json → extract URLs for refresh).
4. Web UI — Refresh schedule manager
New section in the Reboot settings page (or a /cortex/refresh route):
- Table: all refresh schedules (name, seed URL, interval, enabled, next run, last run, last result)
- Toggle enabled/disabled per schedule
- "Run now" button — triggers immediate
run-duefor that schedule - "Add schedule" form: name, URL, tier selector
- Job history: recent refresh jobs with changed/unchanged/failed counts
5. Refresh status in axon status and web UI
axon status should include a refresh section:
Refresh Schedules: 8 active, 2 disabled
Next due: rust-std in 4h 23m
Last run: axon-qdrant 12 minutes ago (checked 47 URLs, 3 changed, 3 re-embedded)
6. Refresh on crawl completion
After a crawl job completes, optionally auto-register a refresh schedule for the crawled domain:
axon crawl https://docs.rs/axum/ --auto-refresh medium
# → creates schedule: "auto-docs.rs/axum" every 6 hoursFlag: --auto-refresh <tier> on axon crawl. Store the schedule name as auto-{domain}.
7. axon refresh schedule add — configurable concurrency
Current processor uses 2 hardcoded lanes. Add per-schedule or global config:
AXON_REFRESH_WORKER_LANES=4 # default 2Files
| File | Action |
|---|---|
docker-compose.yaml |
Add axon-refresh-scheduler service |
scripts/bootstrap-refresh-schedules.sh |
Idempotent schedule creation for core doc sites |
crates/jobs/refresh/processor.rs |
Ensure manifest written after crawl; AXON_REFRESH_WORKER_LANES env var |
apps/web/app/settings/ or apps/web/app/cortex/refresh/ |
Schedule manager UI |
crates/cli/commands/status.rs |
Add refresh section to axon status output |
crates/cli/commands/crawl.rs |
--auto-refresh <tier> flag |
docs/OPERATIONS.md |
Document refresh scheduler setup and bootstrap |
Acceptance Criteria
-
axon-refresh-schedulerservice indocker-compose.yaml, starts with infra -
scripts/bootstrap-refresh-schedules.shcreates schedules for core doc sites (idempotent) -
axon refresh workerlanes configurable viaAXON_REFRESH_WORKER_LANES -
axon statusincludes refresh schedule summary (active count, next due, last result) - Web UI shows refresh schedules table with enable/disable/run-now controls
-
axon crawl --auto-refresh <tier>creates a refresh schedule on completion - Seed URL + manifest path documented; crawl jobs write manifests to expected location
- All
cargo clippyclean, all tests pass