PreServe: Intelligent Management for LMaaS Systems via Hierarchical Prediction

This is the repository of PreServe for the paper PreServe: Intelligent Management for LMaaS Systems via Hierarchical Prediction.

In this work, we propose PreServe, a tailored LMaaS (Large Model as a Service) management framework based on hierarchical prediction.

Repository Organization

├── LLMServe
│   ├── global_scheduler
│   │   ├── workload_predictor
│   │   │   └── predictor.py
│   │   ├── load_predictor
│   │   │   ├── data_loader.py
│   │   │   ├── model.py
│   │   │   └── predictor.py
│   │   ├── scaler.py
│   │   └── scheduler.py
│   ├── request_generater
│   │   ├── generator.py
│   │   ├── load.py
│   │   └── workload.py
│   ├── serve_instance
│   │   ├── instance.py
│   │   └── lookahead.py
│   ├── benchmark.py
│   ├── config.py
│   ├── logger.py
│   ├── test_bench.py
│   ├── test_request_predictor.py
│   └── util.py
├── data
│   ├── workloads/...
│   ├── datasets/...
│   ├── download_datasets.sh
│   ├── download_workloads.sh
│   ├── preprocess_datasets.py
│   ├── preprocess_workloads.py
│   ├── run.sh
│   └── README.md
├── experiments
│   ├── motivation_study/...
│   ├── RQ1/...
│   ├── RQ2/...
│   ├── RQ3/...
│   ├── RQ4/...
│   ├── download_gdrive.py
│   └── Reproducibility.md
├── results
│   ├── cases/
│   └── result_analysis_metrics.py
├── scripts
│   ├── ...
│   ├── benchmark_RQ2.sh
│   └── benchmark_RQ3_7b.sh
├── assets/...
├── instance_configurations_4.json
├── instance_configurations.json
├── offline_train.py
├── setup.py
├── requirements.txt
├── .gitignore
└── README.md

Installation

First, create a Conda environment with Python>=3.10:

conda create -n LLMServe python=3.10 -y
conda activate LLMServe

Next, install dependencies:

# clone the repository
cd PreServe
pip install -r requirements.txt

Finally, install the package locally:

pip install -e .

How-to-use

1. Prepare Benchmark Datasets

Download the workload and load datasets.

cd ./data/
./run.sh

Example preprocessing:

# Authenticate with Hugging Face (replace {} with your token):
huggingface-cli login --add-to-git-credential --token hf_{} 

# Preprocess the load dataset: ShareGPT
python preprocess_datasets.py \
	--min_input_tokens 16 \
	--min_out_tokens 16 \
	--max_input_tokens 4096 \
	--max_out_tokens 4096 \
	--min_total_tokens 32 \
	--max_total_tokens 4096 \
	--tokenizer_name "meta-llama/Llama-2-7b-hf"

# Preprocess the load dataset: Azure_code & Azure_conv
python preprocess_workloads.py

2. Train the Request Load Predictor

CUDA_VISIBLE_DEVICES=0 python offline_train.py --response_type 1 --use_prompt 1 --resample 1

3. Local benchmark

Start the vllm server

Example command:

CUDA_VISIBLE_DEVICES=0 vllm serve "meta-llama/Llama-2-7b-hf" --port 8000

Run the benchmark

Example command:

cd ./LLMServe
python benchmark.py \
	--request_num "2000" \
	--model_name "meta-llama/Llama-2-7b-hf" \
	--result_dir "../results/cases/" \
	--load "ShareGPT" \
	--load_dataset_path "../data/datasets/ShareGPT/cleaned.csv" \
	--workload "poisson" \
	--qps "2" \
	--num_instances "1" \
	--scheduler_policy "preserve" \
	--scaler_policy "none" \
	--req_predictor_policy "load_predictor" \
	--max_model_len 4096 \
	--max_num_seqs 128 \
	--max_num_batched_tokens 8192

Reproducibility

The instructions to reproduce the experiment results in our paper can be found in the Reproducibility.

(For the sake of ease of use for users, here we adopt fixed LLM instance tables, which can also be flexibly integrated with frameworks such as Ray.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PreServe: Intelligent Management for LMaaS Systems via Hierarchical Prediction

Repository Organization

Installation

How-to-use

1. Prepare Benchmark Datasets

2. Train the Request Load Predictor

3. Local benchmark

Reproducibility

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LLMServe		LLMServe
assets		assets
data		data
experiments		experiments
results		results
saved_model		saved_model
scripts		scripts
.gitignore		.gitignore
README.md		README.md
instance_configurations_4.json		instance_configurations_4.json
instance_configurations_8.json		instance_configurations_8.json
offline_train.py		offline_train.py
requirements.txt		requirements.txt
setup.py		setup.py

OpsPAI/PreServe

Folders and files

Latest commit

History

Repository files navigation

PreServe: Intelligent Management for LMaaS Systems via Hierarchical Prediction

Repository Organization

Installation

How-to-use

1. Prepare Benchmark Datasets

2. Train the Request Load Predictor

3. Local benchmark

Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages