Skip to content
/ PA-MoE Public

[IEEE TMM ACCEPTED] Official implementation of Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming in IEEE Transactions on Multimedia.

Notifications You must be signed in to change notification settings

tinyzqh/PA-MoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming (IEEE TMM Accepted)

Python 3.9 PyTorch License arXiv

CDF Bitrate vs Rebuf Bitrate vs Smoothness Smoothness vs Rebuffer

A reinforcement learning framework for adaptive video streaming that leverages Plasticity-Aware Mixture of Experts (PA-MoE) to handle Quality of Experience (QoE) shifts across different content types (documentary, live, news). This project implements various MoE architectures with PPO (Proximal Policy Optimization) to achieve robust bitrate adaptation under changing QoE preferences.

🎯 Features

  • Multiple MoE Architectures: Supports MLP, MoE, SparseMoE, and Plasticity-Aware MoE (PA-MoE) for both policy and value networks
  • QoE Shift Handling: Adapts to different content types with varying QoE preferences (bitrate, rebuffering, smoothness)
  • PPO-based Training: Implements PPO algorithm with GAE (Generalized Advantage Estimation) for stable learning
  • Comprehensive Evaluation: Includes comparison with baseline methods (BOLA, Buffer-based, Rate-based, RobustMPC, Pensieve, Merina)
  • Rich Visualizations: Generates performance plots including CDF, QoE components, bitrate-rebuffer trade-offs, and smoothness analysis
  • Plasticity Mechanisms: Implements expert noise injection and dormant unit detection for better adaptation

πŸ“‹ Table of Contents

πŸš€ Installation

Prerequisites

  • Python 3.9
  • CUDA-capable GPU (optional, but recommended)
  • Conda (recommended) or pip

Setup

  1. Clone the repository

    git clone https://github.com/tinyzqh/PA-MoE
    cd PA-MoE
  2. Create conda environment

    conda create -n pamoe python=3.9
    conda activate pamoe
  3. Install dependencies

    pip install -r requirements.txt
  4. Install OptiVerse (for performance evaluation)

    pip install git+https://github.com/tinyzqh/OptiVerse.git

πŸƒ Quick Start

Training

Train a PA-MoE agent with default configuration (distill moe is old name of plasticity aware MoE.):

export PYTHONPATH=.
python src/ppo.py --envs_model change --seed 1 --policy_type dmoe --value_type dmoe

Evaluation and Visualization

Run evaluation pipeline to compare different methods and generate plots:

python src/plots_perform/pipeline.py

This will generate performance plots in the assert/ directory:

  • cdf_test.pdf: Cumulative distribution function of QoE
  • qoe_bar_test.pdf: QoE component breakdown
  • bitrate_rebuf_test.pdf: Bitrate vs rebuffering trade-off
  • smo_rebuffer_test.pdf: Smoothness vs rebuffering trade-off
  • bitrate_smo_test.pdf: Bitrate vs smoothness trade-off

πŸ“ Project Structure

PA-MoE/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ppo.py                 # Main training script
β”‚   β”œβ”€β”€ configs/
β”‚   β”‚   └── ppo_config.py      # PPO configuration
β”‚   β”œβ”€β”€ envs/
β”‚   β”‚   β”œβ”€β”€ StreamingEnv.py    # Adaptive streaming environment
β”‚   β”‚   β”œβ”€β”€ trace/             # Network trace data
β”‚   β”‚   └── envivio/           # Video chunk size data
β”‚   β”œβ”€β”€ network/
β”‚   β”‚   β”œβ”€β”€ mlp/               # MLP baseline
β”‚   β”‚   β”œβ”€β”€ moe/               # Standard MoE
β”‚   β”‚   β”œβ”€β”€ sparse_moe/        # Sparse MoE
β”‚   β”‚   └── distill_moe/       # PA-MoE (DistillMoE)
β”‚   β”œβ”€β”€ plots_perform/
β”‚   β”‚   β”œβ”€β”€ pipeline.py        # Evaluation pipeline
β”‚   β”‚   β”œβ”€β”€ agents/            # Agent implementations
β”‚   β”‚   β”‚   β”œβ”€β”€ pamoe_agent.py
β”‚   β”‚   β”‚   β”œβ”€β”€ ppo_agent.py
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── weights/           # Pre-trained model weights
β”‚   β”œβ”€β”€ plots/                 # Analysis plots
β”‚   └── utils/                 # Utility functions
β”œβ”€β”€ assert/                    # Generated plots and results
β”œβ”€β”€ train.sh                   # Training script
β”œβ”€β”€ requirements.txt           # Python dependencies
└── README.md

πŸ“– Usage

Training

The training script supports various configurations:

python src/ppo.py \
    --envs_model change \           # Environment mode: "normal" or "change"
    --seed 1 \                      # Random seed
    --policy_type dmoe \            # Policy network: "mlp", "moe", "smoe", "dmoe"
    --value_type dmoe \             # Value network: "mlp", "moe", "smoe", "dmoe"
    --total_timesteps 2000000 \     # Total training timesteps
    --learning_rate 1e-3 \          # Learning rate
    --track                         # Enable wandb tracking

Key Parameters:

  • envs_model:
    • "normal": Fixed content type (documentary)
    • "change": Varying content types (documentary β†’ live β†’ news β†’ ...)
  • policy_type / value_type: Network architecture
    • "mlp": Multi-layer perceptron baseline
    • "moe": Standard Mixture of Experts
    • "smoe": Sparse MoE
    • "dmoe": PA-MoE (DistillMoE with plasticity mechanisms)

Training Modes:

  • Normal mode: Train on a single content type (documentary)
  • Change mode: Train with QoE shifts across different content types

Evaluation

The evaluation pipeline (src/plots_perform/pipeline.py) compares multiple methods:

  • Rule-based: Buffer-based, Rate-based, RobustMPC
  • Learning-based: Pensieve (PPO), Merina (Meta-RL), PA-MoE

Results are saved as PDF files in the assert/ directory.

Configuration

Modify src/configs/ppo_config.py to adjust hyperparameters:

@dataclass
class Config:
    # Training
    total_timesteps: int = 2_000_000
    learning_rate: float = 1e-3
    num_steps: int = 2000
    update_epochs: int = 5
    
    # PPO
    clip_coef: float = 0.2
    vf_coef: float = 5
    ent_coef: float = 0.0
    gamma: float = 0.99
    gae_lambda: float = 0.95
    
    # Plasticity
    redo_tau: float = 0.5        # Dormant unit threshold
    gradient_tau: float = 0.05   # Gradient threshold

πŸ“Š Results

Performance Comparison

PA-MoE demonstrates superior performance in handling QoE shifts:

  • Better QoE: Achieves higher average QoE across different content types
  • Robust Adaptation: Maintains performance when QoE preferences change
  • Efficient Expert Utilization: Activates relevant experts for different content types

Generated Plots

The evaluation generates comprehensive visualizations:

  1. CDF of QoE: Distribution of QoE values across sessions
  2. QoE Components: Breakdown of bitrate, rebuffering, and smoothness rewards
  3. Bitrate-Rebuffer Trade-off: Performance in bitrate vs rebuffering space
  4. Smoothness-Rebuffer Trade-off: Performance in smoothness vs rebuffering space
  5. Bitrate-Smoothness Trade-off: Performance in bitrate vs smoothness space

πŸ—οΈ Architecture

PA-MoE (Plasticity-Aware MoE)

PA-MoE extends standard MoE with plasticity mechanisms:

  1. Noisy Top-K Router: Routes inputs to experts with pseudo-noise for exploration
  2. Expert Noise Injection: Adds small noise to expert parameters to maintain plasticity
  3. Dormant Unit Detection: Identifies and reactivates dormant units using ReDo (Recycled Dormant) mechanism
  4. Gradient-based Adaptation: Monitors gradient information for expert utilization

Network Components

  • Router: Noisy Top-K gating network that selects experts
  • Experts: Specialized networks for different QoE regimes
  • Policy Head: Outputs action probabilities
  • Value Head: Estimates state values

πŸ”¬ Key Innovations

  1. Plasticity Mechanisms: Expert noise injection and dormant unit reactivation enable adaptation to QoE shifts
  2. Content-Aware Routing: Router learns to select appropriate experts for different content types
  3. Stable Training: Pseudo-noise in router maintains PPO ratio stability while enabling exploration

πŸ“ Citation

If you use this code in your research, please cite:

@article{he2025plasticity,
  title={Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming},
  author={He, Zhiqiang and Liu, Zhi},
  journal={arXiv preprint arXiv:2504.09906},
  year={2025}
}

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OptiVerse for the video streaming environment
  • The open-source RL community for PPO implementations

πŸ“§ Contact

For questions or issues, please open an issue on GitHub.


Note: This project is part of research on adaptive video streaming with reinforcement learning. For detailed experimental results and analysis, please refer to the paper (if available).

About

[IEEE TMM ACCEPTED] Official implementation of Plasticity-Aware Mixture of Experts for Learning Under QoE Shifts in Adaptive Video Streaming in IEEE Transactions on Multimedia.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published