run_sg_offline.py: The demo script that executes the model inference pipeline.Dockerfile: Defines the container environment required to run the tutorial.
Customize the execution via Docker -e flags:
- Inputs
PROMPT: The user query (e.g., "Who are you?").SYSTEM_PROMPT: System instructions or persona (e.g., "You are a helpful assistant").
- Model Settings
MODEL_PATH: Location of the model files.CONTEXT_LENGTH: Max context window size (Default: 4096).
- Generation Control
TEMPERATURE: Controls creativity (0.0 = deterministic).FREQ_PENALTY: Controls repetition (Set > 1.0 to prevent loops).MAX_NEW_TOKENS: Maximum output tokens generated.
docker run --gpus all -v "$PWD":/home/work -w /home/work \
-e MODEL_PATH="./Qwen3-0.6B" \
-e PROMPT="What is this project?" \
-e SYSTEM_PROMPT="You are the Hello AI tutorial." \
-e FREQ_PENALTY=1.2 \
lmsysorg/sglang:latest \
python3 run_sg_offline.py