Skip to content

feat(refresh): scheduled axon refresh for keeping indexed docs fresh #39

@jmagar

Description

@jmagar

Summary

axon refresh is fully implemented (schedules, 304 detection, SHA256 change detection, auto-embed on change, MCP integration) but it's not being used to keep our own indexed documentation fresh. Wire it up: create schedules for all major doc sites we depend on, ensure the scheduler worker runs as a proper service, and surface refresh status in the web UI.

Current State (fully implemented, just not deployed)

The refresh system has everything:

  • axon_refresh_jobs — job table with status/result tracking
  • axon_refresh_targets — per-URL ETag + Last-Modified + SHA256 state
  • axon_refresh_schedules — named schedules with interval-based firing (every_seconds)
  • Tiers: high (30min), medium (6h), low (24h)
  • 304 Not Modified support — zero re-embedding if content unchanged
  • SHA256 fallback for servers without ETag/Last-Modified
  • axon refresh schedule worker — polling loop (30s tick, configurable via AXON_REFRESH_SCHEDULER_TICK_SECS)
  • MCP tool: { "action": "refresh", "subaction": "schedule", "schedule_subaction": "create", ... }

What's missing:

  • The scheduler worker is NOT in docker-compose.yaml — it only runs if started manually
  • No pre-configured schedules for the doc sites we actually depend on
  • No web UI surface for managing refresh schedules
  • The refresh-schedule-worker s6 service referenced in the codebase doesn't exist yet in compose

Work Items

1. Add refresh-schedule-worker to docker-compose.yaml

axon-refresh-scheduler:
  # same image as axon-workers
  command: ["axon", "refresh", "schedule", "worker"]
  environment:
    AXON_REFRESH_SCHEDULER_TICK_SECS: "30"
  depends_on: [axon-postgres, axon-rabbitmq]
  restart: unless-stopped

Also ensure axon refresh worker (job processor) is running — add as a separate service or lane in the existing workers container.

2. Bootstrap schedules for core doc sites

Create a setup script or axon doctor-triggered bootstrap that creates refresh schedules for docs we actively use:

# Rust / core language docs
axon refresh schedule add rust-std     https://doc.rust-lang.org/std/    --tier low
axon refresh schedule add rust-async   https://rust-lang.github.io/async-book/ --tier low
axon refresh schedule add tokio-docs   https://docs.rs/tokio/             --tier medium

# Our own stack
axon refresh schedule add axon-qdrant  https://qdrant.tech/documentation/  --tier medium
axon refresh schedule add rmcp-docs    https://docs.rs/rmcp/               --tier medium
axon refresh schedule add axum-docs    https://docs.rs/axum/               --tier medium

# AI / Agent skills
axon refresh schedule add agentskills  https://agentskills.io/home         --tier high
axon refresh schedule add claude-docs  https://docs.anthropic.com/         --tier medium

Store these as a scripts/bootstrap-refresh-schedules.sh that's idempotent (skip if schedule already exists).

3. Seed URL + manifest integration

Currently, if a schedule has only a seed_url (no explicit urls_json), it looks for a crawl manifest at:

  • {output_dir}/domains/{domain}/latest/manifest.jsonl
  • {output_dir}/domains/{domain}/sync/manifest.jsonl

Gap: Many doc sites we crawl don't leave a manifest at these paths. Document the expected manifest format and ensure crawl jobs write manifests to these locations, or provide an alternative seed mechanism (e.g., axon sources --domain docs.rs --json → extract URLs for refresh).

4. Web UI — Refresh schedule manager

New section in the Reboot settings page (or a /cortex/refresh route):

  • Table: all refresh schedules (name, seed URL, interval, enabled, next run, last run, last result)
  • Toggle enabled/disabled per schedule
  • "Run now" button — triggers immediate run-due for that schedule
  • "Add schedule" form: name, URL, tier selector
  • Job history: recent refresh jobs with changed/unchanged/failed counts

5. Refresh status in axon status and web UI

axon status should include a refresh section:

Refresh Schedules: 8 active, 2 disabled
  Next due:  rust-std in 4h 23m
  Last run:  axon-qdrant 12 minutes ago (checked 47 URLs, 3 changed, 3 re-embedded)

6. Refresh on crawl completion

After a crawl job completes, optionally auto-register a refresh schedule for the crawled domain:

axon crawl https://docs.rs/axum/ --auto-refresh medium
# → creates schedule: "auto-docs.rs/axum" every 6 hours

Flag: --auto-refresh <tier> on axon crawl. Store the schedule name as auto-{domain}.

7. axon refresh schedule add — configurable concurrency

Current processor uses 2 hardcoded lanes. Add per-schedule or global config:

AXON_REFRESH_WORKER_LANES=4    # default 2

Files

File Action
docker-compose.yaml Add axon-refresh-scheduler service
scripts/bootstrap-refresh-schedules.sh Idempotent schedule creation for core doc sites
crates/jobs/refresh/processor.rs Ensure manifest written after crawl; AXON_REFRESH_WORKER_LANES env var
apps/web/app/settings/ or apps/web/app/cortex/refresh/ Schedule manager UI
crates/cli/commands/status.rs Add refresh section to axon status output
crates/cli/commands/crawl.rs --auto-refresh <tier> flag
docs/OPERATIONS.md Document refresh scheduler setup and bootstrap

Acceptance Criteria

  • axon-refresh-scheduler service in docker-compose.yaml, starts with infra
  • scripts/bootstrap-refresh-schedules.sh creates schedules for core doc sites (idempotent)
  • axon refresh worker lanes configurable via AXON_REFRESH_WORKER_LANES
  • axon status includes refresh schedule summary (active count, next due, last result)
  • Web UI shows refresh schedules table with enable/disable/run-now controls
  • axon crawl --auto-refresh <tier> creates a refresh schedule on completion
  • Seed URL + manifest path documented; crawl jobs write manifests to expected location
  • All cargo clippy clean, all tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions