Pipeline Parallel Qwen 2.5 1.5B - CPU Only

A simple implementation of pipeline parallelism for Qwen 2.5 1.5B using CPU-only inference. Splits the model across two nodes connected via HTTP REST API.

[Client Request] 
       ↓
[Node 1: Embedding + Layers 0-15] 
       ↓ HTTP JSON
[Node 2: Layers 16-31 + Generation]
       ↓
[Generated Response]

Architecture overview:

Node 1: Token embedding, position embedding, first 16 transformer layers
Node 2: Remaining 16 layers, layer normalization, language modeling head, text generation

Quick Start

1. Setup Environment

python3 -m venv meta_pipeline_env
source meta_pipeline_env/bin/activate  # On Windows: meta_pipeline_env\Scripts\activate

# Install PyTorch CPU version first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install remaining dependencies
pip install -r requirements.txt

# Log in to HF
huggingface-cli login
# Enter your token when prompted

2. Run Pipeline Parallel Inference

Terminal 1 - Start Node 2 (Final layers + Generation):

source meta_pipeline_env/bin/activate  # IMPORTANT: Always activate first!
python node2.py --model Qwen/Qwen2.5-1.5B-Instruct --split-layer 14

Terminal 2 - Start Node 1 (First layers + Embeddings):

source meta_pipeline_env/bin/activate  # IMPORTANT: Always activate first!
python node1.py --model Qwen/Qwen2.5-1.5B-Instruct --split-layer 14

Usage

OpenAI-Compatible API:

# Single completion
curl -X POST http://localhost:5001/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum computing in simple terms:",
    "max_tokens": 100,
    "temperature": 0.7
  }'

License

This code is provided for educational and research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
node1.py		node1.py
node2.py		node2.py
requirements.txt		requirements.txt
test_client.py		test_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline Parallel Qwen 2.5 1.5B - CPU Only

Quick Start

1. Setup Environment

2. Run Pipeline Parallel Inference

Usage

License

About

Uh oh!

Releases

Packages

Languages

rg0now/pp-llm

Folders and files

Latest commit

History

Repository files navigation

Pipeline Parallel Qwen 2.5 1.5B - CPU Only

Quick Start

1. Setup Environment

2. Run Pipeline Parallel Inference

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages