STAR-PPO: Spatiotemporal-Aware Reinforcement Learning for MoE Inference Scheduling

📝 Abstract

This repository is the official implementation of STAR-PPO, a novel scheduling framework designed for Mixture-of-Experts (MoE) inference in geo-distributed edge-cloud environments.

STAR-PPO addresses the critical challenge of routing complex, multi-step inference workflows across heterogeneous servers with unstable network connections. By integrating spatiotemporal awareness into the reinforcement learning state space and employing Dynamic Weight Adjustment (DWA), STAR-PPO achieves a superior balance between latency, inference cost, and service reliability, specifically robustness against "network traps" (high-latency links).

🚀 News

Code and sample dataset (Server1_Trap) released.

📂 Project Structure

.
├── STAR_PPO/               # [Ours] STAR-PPO Algorithm Implementation
│   ├── agent.py            # PPO Agent with Spatiotemporal Awareness
│   ├── model.py            # Actor-Critic Networks
│   └── train.py            # Training Loop
├── PFAPPO/                 # [Baseline] PF-PPO Implementation
├── PPO_algorithm/          # [Baseline] Standard PPO Implementation
├── Stark_Scheduler/        # [Baseline] STARK Implementation
├── env.py                  # Simulation Environment (Edge-Cloud MoE)
├── data1/                  # Dataset Directory
│   └── Server1_Trap/       # Provided Sample Dataset (500 Nodes)
├── total/                  # Visualization & Analysis Scripts
└── requirements.txt        # Dependencies

💾 Dataset: Edge-Cloud MoE Bench

We introduce a high-fidelity benchmark dataset for edge-cloud MoE inference scheduling.

Why this dataset is valuable?

Unlike synthetic traces used in prior works, this dataset captures the complexity of real-world edge computing:

Heterogeneity: Diverse server compute capabilities and cost models.
Geo-Distribution: Latency based on physical distance (Haversine formula).
Adversarial "Traps": Specific network links exhibit stochastic high latency/packet loss, simulating real-world network jitter. This is critical for testing robustness.
MoE Workflows: Tasks require sequential processing by specific expert models distributed across the topology.

Availability

We provide the Server1_Trap (500 servers) scale as a sample for reproducibility.

Note: The full dataset includes larger scales (1000 and 2000 servers) with varying densities and trap configurations. Due to the size and commercial value of the full topology data, only the sample is open-sourced.

Researchers requiring the complete dataset for comparison benchmarks should contact: gymorsiback@tju.edu.cn

🛠️ Installation

Clone the repository

git clone https://github.com/anonymous/STAR-PPO.git
cd STAR-PPO

Install dependencies
```
pip install -r requirements.txt
```

⚡ Quick Start

1. Training

To train STAR-PPO on the provided sample dataset:

python STAR_PPO/train.py \
    --data data1 \
    --regions Server1_Trap \
    --epochs 100 \
    --device cuda

You can also train baseline algorithms (e.g., PF-PPO, Standard PPO):

# Train PF-PPO
python PFAPPO/train.py --data data1 --regions Server1_Trap --epochs 100

2. Inference & Evaluation

Load the trained model to evaluate performance:

python STAR_PPO/inference.py \
    --data data1 \
    --regions Server1_Trap \
    --model results/STAR_PPO/models/LATEST_actor.pt \
    --episodes 500

3. Visualization

Generate comparison figures (Learning Curves, Cost Breakdown, Latency CDFs):

# Generates all figures in the 'total/' folder
python total/plot_all_comparison.py

📊 Results

Our method consistently outperforms baselines in terms of reward, latency, and cost efficiency, especially in "Trap" environments.

(Visualization results are generated in the total/ directory)

📧 Contact

For any questions regarding the code or to request the full dataset, please email:
gymorsiback@tju.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
A3C_algorithm		A3C_algorithm
Ablation Studies		Ablation Studies
Generalization Experiments		Generalization Experiments
Greedy		Greedy
PFAPPO		PFAPPO
PPO_CN		PPO_CN
PPO_GNN		PPO_GNN
PPO_algorithm		PPO_algorithm
Random		Random
STAR_PPO		STAR_PPO
Stark_Scheduler		Stark_Scheduler
Trans		Trans
data1/Server1_Trap		data1/Server1_Trap
inference		inference
results		results
total		total
.gitattributes		.gitattributes
README.md		README.md
env.py		env.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAR-PPO: Spatiotemporal-Aware Reinforcement Learning for MoE Inference Scheduling

📝 Abstract

🚀 News

📂 Project Structure

💾 Dataset: Edge-Cloud MoE Bench

Why this dataset is valuable?

Availability

🛠️ Installation

⚡ Quick Start

1. Training

2. Inference & Evaluation

3. Visualization

📊 Results

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STAR-PPO: Spatiotemporal-Aware Reinforcement Learning for MoE Inference Scheduling

📝 Abstract

🚀 News

📂 Project Structure

💾 Dataset: Edge-Cloud MoE Bench

Why this dataset is valuable?

Availability

🛠️ Installation

⚡ Quick Start

1. Training

2. Inference & Evaluation

3. Visualization

📊 Results

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages