Skip to content

Y-Research-SBU/TimeSeriesScientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis


Haokun Zhao1,2 ★  Xiang Zhang3 ★  Jiaqi Wei4 ★  Yiwei Xu5 
Yuting He6  Siqi Sun7  Chenyu You1

1 Stony Brook University    2 University of California, San Diego    3 University of British Columbia   
4 Zhejiang University    5 University of California, Los Angeles   
6 Case Western Reserve University    7 Fudan University    Equal Contribution

Paper Project Website LangGraph 0.4.8

Abstract

In this paper, we introduce TimeSeriesScientist (TSci), the first LLM-driven agentic framework for general time series forecasting.

The framework comprises four specialized agents:

  • Curator – performs LLM-guided diagnostics augmented by external tools that reason over data statistics to choose targeted preprocessing;
  • Planner – narrows the hypothesis space of model choice by leveraging multi-modal diagnostics and self-planning over the input;
  • Forecaster – performs model fitting and validation and based on the results to adaptively select the best model configuration as well as ensemble strategy to make final predictions;
  • Reporter – synthesizes the whole process into a comprehensive, transparent report.

TSci transforms the forecasting workflow into a white-box system that is both interpretable and extensible across tasks.

workflow

📢 News

  • 2025.10.02 Our paper is now available on arXiv! 📄
  • 2025.10.01 Code Released. TimeSeriesScientist now available! 🎉🎉

🚀 Quick Start

1. Prerequisites

  • Python 3.8 or higher
  • OpenAI API key
  • Sufficient disk space for results and models

2. Create Virtual Environment

# Create virtual environment
conda create -n TSci

conda activate TSci

3. Install Dependencies

pip install -r requirements.txt

4. Set Environment Variables

# Set your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key-here"

# Optional: Set other environment variables
export GOOGLE_API_KEY="your-google-api-key-here"

🎯 Code Execution

1. Basic Usage

# Run with default configuration
python main.py

2. Custom Configuration

The customized dataset must be provided in .csv format with at least two columns: "date" and "OT". The "date" column contains the timestamps, while the "OT" column records the target values at each corresponding timestamp. An Example input file is provided at dataset/HTTh1.csv.

# Run with custom parameters
python main.py --data_path /your-customized-dataset \
                --num_slices 10 \
                --horizon 96 \
                --llm_model gpt-4o

3. Debug Mode

# Run with debug and verbose output
python main.py --debug --verbose --data_path your_data.csv

Agent Workflow

The system uses a sophisticated multi-agent workflow orchestrated by LangGraph. Here's how the agents work together:

1. PreprocessAgent

Purpose: Data cleaning and preparation

Responsibilities:

  • Load and validate time series data
  • Handle missing values using LLM-recommended strategies
  • Detect and handle outliers using rolling window IQR
  • Generate data quality reports
  • Create preprocessing visualizations

LLM Integration: Uses LLM to analyze data quality and recommend preprocessing strategies

Output: Cleaned data, quality metrics, and preprocessing visualizations

2. AnalysisAgent

Purpose: Comprehensive data analysis and insights

Responsibilities:

  • Analyze data characteristics (trend, seasonality, stationarity)
  • Generate statistical summaries
  • Identify patterns and anomalies
  • Provide forecasting readiness assessment
  • Create analysis visualizations

LLM Integration: Uses LLM to interpret data patterns and provide insights

Output: Analysis report with key findings and recommendations

3. ValidationAgent

Purpose: Model selection and hyperparameter optimization

Responsibilities:

  • Select best models based on data characteristics
  • Optimize hyperparameters for each selected model
  • Evaluate models on validation data
  • Rank models by performance
  • Generate validation reports

LLM Integration: Uses LLM to select appropriate models and hyperparameters

Output: Selected models with optimized hyperparameters and validation scores

4. ForecastAgent

Purpose: Generate predictions and ensemble forecasts

Responsibilities:

  • Train models with optimized hyperparameters
  • Generate individual model predictions
  • Create ensemble predictions using multiple methods
  • Calculate confidence intervals
  • Generate forecast visualizations
  • Compute performance metrics

LLM Integration: Uses LLM to decide ensemble integration strategies

Output: Individual and ensemble predictions, metrics, and visualizations

5. ReportAgent

Purpose: Generate comprehensive final reports

Responsibilities:

  • Synthesize all experiment results
  • Create executive summaries
  • Generate actionable recommendations
  • Produce final visualizations
  • Save results in multiple formats

LLM Integration: Uses LLM to generate human-readable reports

Output: Comprehensive experiment report with insights and recommendations

Workflow Execution

Data Input → PreprocessAgent → AnalysisAgent → ValidationAgent → ForecastAgent → ReportAgent → Final Output
     ↓              ↓              ↓              ↓              ↓              ↓
  Raw Data    Cleaned Data   Analysis Report   Selected Models   Predictions   Final Report

Key Features of the Workflow:

  1. State Management: Each agent receives and updates a shared state object
  2. Error Recovery: Built-in retry mechanisms and fallback strategies
  3. Rate Limiting: Automatic delays and retries for API rate limits
  4. Progress Monitoring: Real-time progress tracking and timing information
  5. Result Aggregation: Automatic aggregation of results across multiple data slices

Project Structure

time_series_agent/
│
├── agents/                    # Agent modules
│   ├── preprocess_agent.py    # Data preprocessing agent
│   ├── analysis_agent.py      # Data analysis agent
│   ├── validation_agent.py    # Model validation agent
│   ├── forecast_agent.py      # Forecasting agent
│   ├── report_agent.py        # Report generation agent
│   └── memory.py              # Memory management module
│
├── graph/                     # Workflow orchestration
│   └── agent_graph.py         # LangGraph workflow definition
│
├── utils/                     # Utility modules
│   ├── data_utils.py          # Data utilities
│   ├── visualization_utils.py # Visualization utilities
│   ├── file_utils.py          # File management utilities
│   └── model_library.py       # Model implementations
│
├── config/                    # Configuration
│   └── default_config.py      # Default configuration
│
├── main.py                    # Main entry point
├── requirements.txt           # Dependencies
└── README.md                  # This file

Output Structure

results/
├── reports/                   # Generated reports
│   ├── complete_time_series_report_YYYYMMDD_HHMMSS.json
│   └── aggregated_forecast_results_YYYYMMDD_HHMMSS.json
├── preprocess/                # Preprocessing outputs
│   ├── visualizations/
│   └── analysis_reports/
├── forecast/                  # Forecasting outputs
│   ├── visualizations/
│   └── prediction_results/
└── logs/                      # Execution logs

Supported Models

Statistical Models

  • ARMA, ARIMA
  • Exponential Smoothing
  • TBATS, Theta
  • Prophet

Machine Learning Models

  • Linear Regression
  • Random Forest
  • SVR (Support Vector Regression)
  • Gradient Boosting
  • XGBoost, LightGBM

Deep Learning Models

  • LSTM
  • Neural Networks
  • Transformer

Ensemble Methods

  • Simple Average
  • Weighted Average
  • Median
  • Trimmed Mean

Configuration Options

Basic Configuration

config = {
    "data_path": "../dataset/ETT-small/ETTh1.csv",
    "num_slices": 10,
    "input_length": 512,
    "horizon": 96,
    "k_models": 3,
    "llm_model": "gpt-4o",
    "debug": True
}

Advanced Configuration

config = {
    # Data processing
    "preprocess": {
        "missing_value_strategy": "interpolate",
        "outlier_strategy": "clip",
        "normalization": False
    },
    
    # Model selection
    "models": {
        "available_models": ["ARMA", "LSTM", "RandomForest"],
        "ensemble_method": "weighted_average",
        "hyperparameter_optimization": True
    },
    
    # Visualization
    "visualization": {
        "figure_size": (12, 8),
        "save_format": "png",
        "show_plots": False
    }
}

Troubleshooting

Common Issues

  1. API Rate Limiting

    • The system automatically handles rate limits with retries
    • Increase delays between slices if needed
    • Check your OpenAI API usage limits
  2. Memory Issues

    • Reduce num_slices parameter
    • Use smaller input_length
    • Monitor system memory usage
  3. Model Import Errors

    • Install missing dependencies: pip install prophet tbats
    • Some models have optional dependencies
    • Check the model library for requirements

Debug Mode

Enable debug mode for detailed logging:

python main.py --debug --verbose

Performance Optimization

For Large Datasets

  • Reduce num_slices to process fewer slices
  • Use smaller input_length for faster processing
  • Enable parallel processing in configuration

For API Rate Limits

  • Increase delays between agent calls
  • Use fallback strategies when API fails
  • Monitor API usage and adjust accordingly

Contact

For questions, feedback, or collaboration opportunities, please contact:

Email: haz118@ucsd.edu, chenyu.you@stonybrook.edu

License

MIT License

Citation

@misc{zhao2025timeseriesscientistgeneralpurposeaiagent,
      title={TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis}, 
      author={Haokun Zhao and Xiang Zhang and Jiaqi Wei and Yiwei Xu and Yuting He and Siqi Sun and Chenyu You},
      year={2025},
      eprint={2510.01538},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.01538}, 
}

Releases

No releases published

Packages

 
 
 

Contributors

Languages