Castclaw is a CLI-based AI agent framework for automated time-series forecasting research. It orchestrates three specialized agents — Planner, Forecaster, and Critic — that collaborate with you to design experiments, run models, and produce forecasting reports.
- Prerequisites
- Installation
- Quick Start
- Project Structure
- Forecasting Workflow
- Agent Roles
- Constraint File (CAST.md)
- Tools Reference
- Python Environment
- Configuration
- LLM Providers
| Requirement | Version | Notes |
|---|---|---|
| Bun | ≥ 1.3.11 | Runtime and package manager |
| Python | ≥ 3.10 | ML experiment backend |
| uv | latest | Python package manager |
| GPU (optional) | CUDA 12.8 | Required for transformer models; CPU-only is supported |
npm
npm install -g castclawAfter installation, verify the setup:
castclaw --version- Start the interactive TUI from your dataset directory:
cd /path/to/your/dataset
castclaw-
The TUI opens three tabs — Planner, Forecaster, and Critic — corresponding to the three research phases.
-
Switch between agents with
Ctrl+1,Ctrl+2,Ctrl+3. -
Begin a forecasting session in the Planner tab, for example:
Initialize a forecasting session for data/etth1.csv. Target: OT, time column: date,
horizon: 96 steps, lookback: 336. Use a 70/20/10 train/val/test split. Evaluate with MSE and MAE.
Sample dataset (datasets.zip) on Google Drive: https://drive.google.com/file/d/1HOCE20FQgLl0xCv_dOmLcTbN1RCZWwqd/view
Castclaw creates a .forecast/ directory in your project to manage all experiment state:
your-project/
├── CAST.md # (optional) forecasting constraints
├── .forecast/
│ ├── STATE.md # current phase and session metadata
│ ├── task.json # frozen task definition (immutable after creation)
│ ├── CONTEXT.md # compressed context for Critic handoff
│ ├── skills/ # generated model-family skill files
│ ├── runs/ # one directory per experiment run
│ │ └── <run_id>/
│ │ ├── train.log # raw training output
│ │ └── eval.json # evaluation metrics
│ ├── reports/
│ │ ├── qualitative.md # domain research report
│ │ ├── quantitative.json # statistical data analysis
│ │ └── pre-forecast.md # fused analysis report
│ └── viz/ # visualization outputs
├── history.jsonl # full experiment history
├── best.json # current best result
└── budget.json # experiment budget tracking
Castclaw follows a structured five-phase pipeline. Phase transitions are enforced by the forecast_state tool — you cannot skip phases.
Agent: Planner
Start by defining your forecasting task. The Planner will ask for:
- Dataset path (CSV file)
- Target column to forecast
- Time column (datetime index)
- Prediction horizon (how many steps ahead)
- Sequence/look-back length
- Train/val/test split ratios
- Evaluation metrics (MSE, MAE, WAPE, MASE)
- Model families to consider
Example prompt:
Initialize a forecasting session for data/etth1.csv.
Target: OT column, time column: date, horizon: 96 steps, lookback: 336.
Use 70/20/10 train/val/test split. Evaluate with MSE and MAE.
Consider all model families.
The Planner calls forecast_state init to create the .forecast/ directory, then forecast_task to freeze task.json.
Defining Constraints (optional)
Before or after init, you can define project-specific constraints in CAST.md. Use the built-in skill:
/cast-creation
This guides you through an interactive dialog to capture domain rules (forbidden models, resource limits, evaluation requirements, etc.).
Agent: Planner
The Planner runs parallel analysis to understand your data before any model runs:
- Qualitative branch: web research on the forecasting domain (industry context, events, risks)
- Quantitative branch: statistical analysis of your dataset (trend, seasonality, stationarity, volatility, anomalies)
Start analysis:
Run pre-forecast analysis.
Or use the built-in skill directly:
/pre-forecast-workflow
The Planner spawns two subagents, waits for both, and fuses the results into .forecast/reports/pre-forecast.md. This report drives all subsequent model selection.
Agent: Planner
Based on pre-forecast findings, the Planner generates Skill files — structured specifications for each recommended model family. Skills capture:
- Applicable conditions (when to use this family)
- Parameter search space
- Feature template (model configuration JSON)
- Risk warnings for this dataset
The Planner presents 2–4 skills for your review. You can request additions, removals, or modifications. Once you confirm:
Skills look good, confirm them.
The phase transitions to forecast.
Agent: Forecaster
The Forecaster runs experiments using the confirmed Skills. For each experiment:
- Reads the current best result and recent failure history
- Selects a model and configuration from the relevant skill
- Calls
generate_modelto train and evaluate the model - Reflects on the result (
forecast_reflect) - Decides keep or discard; updates state
- Repeats until budget is exhausted or a stop condition triggers
You can observe progress and provide domain feedback at any time:
The model seems to overfit on the last 30 days. Try reducing the lookback.
The Forecaster records your input as expert feedback, resets the no-improvement counter, and continues.
Budget controls (set via CAST.md or defaults):
- Max experiments
- Consecutive no-improvement threshold → triggers human-in-the-loop (HITL) pause
- Consecutive crash threshold → stops the loop
Agent: Critic
After the experiment loop ends (or you explicitly end it), switch to the Critic tab (Ctrl+3).
The Critic reads all experiment artifacts and produces:
- Per-model-family best results comparison
- Per-condition performance breakdown (trend/seasonality/stationarity conditions)
- Visualization scripts (time-series plots, error distributions)
- Final markdown forecast report
Generate the final report.
The report is written to .forecast/reports/final-report.md.
| Agent | Tab | Responsibility |
|---|---|---|
| Planner | Ctrl+1 |
Task definition, data analysis, skill generation, phase orchestration |
| Forecaster | Ctrl+2 |
Experiment loop — proposes, runs, and reflects on model experiments |
| Critic | Ctrl+3 |
Post-experiment analysis, visualization, and final report generation |
Switch between agents at any time with keyboard shortcuts. Each agent maintains independent context but shares the .forecast/ file protocol.
CAST.md is an optional Markdown file at your project root that defines forecasting constraints. It is automatically loaded into every agent's context.
Example CAST.md:
# Forecasting Constraints
## Domain Constraints
Energy consumption forecasting for a commercial building in Germany.
Data is subject to GDPR — no external data sharing during analysis.
## Model Restrictions
- No transformer-based models (GPU unavailable on deployment target)
- Must include at least one interpretable model (ARIMA or ETS)
## Resource Limits
- Maximum 30 minutes per experiment
- CPU-only execution
## Evaluation Preferences
- Prioritize MAE over MSE for business reporting
- Must beat naive baseline by at least 10%
## Additional Notes
Holiday effects are significant — German public holidays should be noted in qualitative analysis.Create interactively with /cast-creation, or write the file manually.
The ML backend lives in python/. It uses uv for dependency management.
# Install all Python dependencies
cd python
uv sync
# Verify the runner works
uv run python -c "from castclaw_ml import runner; print('OK')"Models are sourced from Time-Series-Library. Available model families:
- Statistical: ARIMA, ETS, Theta
- Deep Learning: DLinear, NLinear, PatchTST, TimesNet, iTransformer, Autoformer, and 30+ more
- Foundation: Chronos (Amazon), TimesFM (Google), Moirai (Salesforce)
The runner is invoked by generate_model automatically — you do not call it directly.
Castclaw reads configuration from castclaw.json at your project root (JSONC format):
Global config lives at ~/.config/castclaw/castclaw.json.
Castclaw supports 20+ LLM providers via the Vercel AI SDK. Set your API key as an environment variable:
# Anthropic (default)
export ANTHROPIC_API_KEY=sk-ant-...
# OpenAI
export OPENAI_API_KEY=sk-...
# Google Gemini
export GOOGLE_GENERATIVE_AI_API_KEY=...
# OpenRouter (access many models via one key)
export OPENROUTER_API_KEY=...Then specify the model in castclaw.json or at startup:
castclaw --model anthropic/claude-sonnet-4-6Provider prefix format: <provider>/<model-id> (e.g., anthropic/claude-opus-4-6, google/gemini-2.0-flash).
{ // LLM provider and model "model": "anthropic/claude-sonnet-4-6", // Additional skill scan paths "skills": { "paths": ["~/.my-skills/"] }, // Plugin list (npm package names or file:// paths) "plugins": [] }