Colocation Benchmark

The Colocation Benchmark is a modular and flexible framework designed to evaluate the end-to-end performance of (colocated) machine learning pipelines. The benchmark is built on the following core principles:

End-to-End Performance: Quantifying performance across the entire pipeline, including data loading and preprocessing.
Modularity: Stages like data loading and model execution are standalone building blocks, allowing for flexible pipeline construction.
Colocation: Facilitating research into colocation schedulers and resource managers by providing scenarios with detailed time breakdowns for colocated workloads.

Getting Started

Installation

Clone the repository:

git clone https://github.com/your-repo/collocation-benchmark.git
cd collocation-benchmark

Install dependencies:
```
conda env create -f environment.yml
```

Usage

To run the benchmark, use the main.py script with a configuration file:

python main.py <path_to_config.yml>

Example:

python main.py pipeline_configs/finetune_huggingface.yml

Architecture

The benchmark consists of several modular components that work together to execute and measure ML pipelines.

sequenceDiagram
    participant Benchmark
    create participant LoadGen
    Benchmark->>LoadGen: Build
    create participant Logging
    Benchmark->>Logging: Build
    loop For each Pipeline
        create participant Pipeline
        Benchmark->>Pipeline: Build
        loop For each Stage
            create participant Stage
            Pipeline->>Stage: Build
        end
        activate Benchmark
        Benchmark->>+LoadGen: Start
        LoadGen-->>Logging: Log start of the test
        loop For each query
            LoadGen->>+Pipeline: Invoke Pipeline execution
            Pipeline-->>Logging: Log start of Pipeline execution
            loop For each Stage
                Pipeline-->>Logging: Log start of Stage execution
                Pipeline->>+Stage: Invoke Stage
                Stage->>-Pipeline: Return
                Pipeline-->>Logging: Log end of Stage execution
            end
            Pipeline-->>Logging: Log end of Pipeline execution
            Pipeline->>-LoadGen: Acknowledge finishing the query
            LoadGen-->>Logging: Log end of the test
        end
        LoadGen->>-Benchmark: Return
    end

Components

Benchmark (main.py): The entry point. It parses the configuration, initializes the global logger, and spawns processes for each pipeline's LoadGen.
LoadGen (loadgen/): Generates queries and submits them to the pipeline based on a scheduler (e.g., offline, poisson). It is agnostic to the specific workload.
Pipeline (pipeline/): Orchestrates a sequence of Stages. It manages communication between stages and logs execution times.
Stage (stages/): The fundamental building block. A stage performs a specific task (e.g., DataLoader, Inference, Finetuning).
Logging (utils/logger.py): Asynchronously records performance metrics to minimize interference with execution.

Configuration

The benchmark is configured via YAML files. A configuration defines the pipelines, their stages, and the load generation policy.

Structure

See pipeline_configs/ for examples. A typical config includes:

name: "benchmark_run_name"
pipelines:
  - name: "Pipeline Name"
    inputs: [0] # IDs of input stages
    outputs: [1] # IDs of output stages
    dataset_stage_id: 0
    loadgen:
      component: loadgen.OfflineLoadScheduler
      queue_depth: 10
      max_queries: 100
      config:
        rate: 5 # requests/sec
    stages:
      - name: "Stage 1 (e.g., DataLoader)"
        id: 0
        outputs: [1]
        component: stages.package.ClassName
        config:
            # Stage-specific configuration
            batch_size: 8
            ...
      - name: "Stage 2 (e.g., Model)"
        id: 1
        component: stages.package.ClassName
        config:
            model_name: "..."
            ...

Supported Stages

The benchmark currently supports the following families of stages:

LLM HuggingFace (stages/llm_huggingface):
- HuggingFaceDataLoader: Loads text datasets.
- Finetune: Instruction tuning using PEFT/LoRA.
- Inference: Text generation.
LLM Torchtune (stages/llm_torchtune):
- Implementation of finetuning and generation using PyTorch Tune.
Self RAG (stages/self_rag):
- Retrieval-Augmented Generation workflows.
Vision (stages/torchvision_classification):
- Standard computer vision classification tasks.

Extensibility

Adding a New Stage

Create a new directory in stages/.
Inherit from the base Stage class (stages/stage.py).
Implement the run method and initialization logic.
Register your component path in the YAML config (e.g., stages.my_new_stage.MyStage).

Adding a Scheduler

Add a new scheduler class in loadgen/schedulers/.
Inherit from the base LoadScheduler.
Implement the generation logic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Colocation Benchmark

Getting Started

Installation

Usage

Architecture

Components

Configuration

Structure

Supported Stages

Extensibility

Adding a New Stage

Adding a Scheduler

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
environments		environments
loadgen		loadgen
pipeline		pipeline
pipeline_configs		pipeline_configs
stages		stages
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py

itu-rad/collocation-benchmark

Folders and files

Latest commit

History

Repository files navigation

Colocation Benchmark

Getting Started

Installation

Usage

Architecture

Components

Configuration

Structure

Supported Stages

Extensibility

Adding a New Stage

Adding a Scheduler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages