This repository is the official implementation of STAR-PPO, a novel scheduling framework designed for Mixture-of-Experts (MoE) inference in geo-distributed edge-cloud environments.
STAR-PPO addresses the critical challenge of routing complex, multi-step inference workflows across heterogeneous servers with unstable network connections. By integrating spatiotemporal awareness into the reinforcement learning state space and employing Dynamic Weight Adjustment (DWA), STAR-PPO achieves a superior balance between latency, inference cost, and service reliability, specifically robustness against "network traps" (high-latency links).
- Code and sample dataset (Server1_Trap) released.
.
├── STAR_PPO/ # [Ours] STAR-PPO Algorithm Implementation
│ ├── agent.py # PPO Agent with Spatiotemporal Awareness
│ ├── model.py # Actor-Critic Networks
│ └── train.py # Training Loop
├── PFAPPO/ # [Baseline] PF-PPO Implementation
├── PPO_algorithm/ # [Baseline] Standard PPO Implementation
├── Stark_Scheduler/ # [Baseline] STARK Implementation
├── env.py # Simulation Environment (Edge-Cloud MoE)
├── data1/ # Dataset Directory
│ └── Server1_Trap/ # Provided Sample Dataset (500 Nodes)
├── total/ # Visualization & Analysis Scripts
└── requirements.txt # Dependencies
We introduce a high-fidelity benchmark dataset for edge-cloud MoE inference scheduling.
Unlike synthetic traces used in prior works, this dataset captures the complexity of real-world edge computing:
- Heterogeneity: Diverse server compute capabilities and cost models.
- Geo-Distribution: Latency based on physical distance (Haversine formula).
- Adversarial "Traps": Specific network links exhibit stochastic high latency/packet loss, simulating real-world network jitter. This is critical for testing robustness.
- MoE Workflows: Tasks require sequential processing by specific expert models distributed across the topology.
We provide the Server1_Trap (500 servers) scale as a sample for reproducibility.
Note: The full dataset includes larger scales (1000 and 2000 servers) with varying densities and trap configurations. Due to the size and commercial value of the full topology data, only the sample is open-sourced.
Researchers requiring the complete dataset for comparison benchmarks should contact: gymorsiback@tju.edu.cn
-
Clone the repository
git clone https://github.com/anonymous/STAR-PPO.git cd STAR-PPO -
Install dependencies
pip install -r requirements.txt
To train STAR-PPO on the provided sample dataset:
python STAR_PPO/train.py \
--data data1 \
--regions Server1_Trap \
--epochs 100 \
--device cudaYou can also train baseline algorithms (e.g., PF-PPO, Standard PPO):
# Train PF-PPO
python PFAPPO/train.py --data data1 --regions Server1_Trap --epochs 100Load the trained model to evaluate performance:
python STAR_PPO/inference.py \
--data data1 \
--regions Server1_Trap \
--model results/STAR_PPO/models/LATEST_actor.pt \
--episodes 500Generate comparison figures (Learning Curves, Cost Breakdown, Latency CDFs):
# Generates all figures in the 'total/' folder
python total/plot_all_comparison.pyOur method consistently outperforms baselines in terms of reward, latency, and cost efficiency, especially in "Trap" environments.
(Visualization results are generated in the total/ directory)
For any questions regarding the code or to request the full dataset, please email:
gymorsiback@tju.edu.cn