File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -139,15 +139,15 @@ curl http://localhost:8000/v1/completions \
139139pip install qr-sampler
140140
141141# Start vLLM — qr-sampler registers automatically via entry points
142- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
142+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
143143```
144144
145145Configure the entropy source via environment variables:
146146
147147``` bash
148148export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
149149export QR_GRPC_SERVER_ADDRESS=localhost:50051
150- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
150+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
151151```
152152
153153### Apple Silicon (macOS)
@@ -579,7 +579,7 @@ Or configure directly via environment variables (bare-metal):
579579``` bash
580580export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
581581export QR_GRPC_SERVER_ADDRESS=localhost:50051
582- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
582+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
583583```
584584
585585The template handles all gRPC boilerplate (unary + bidirectional streaming, health checks, graceful shutdown). You only write the hardware-specific code.
Original file line number Diff line number Diff line change @@ -50,4 +50,4 @@ ENTRYPOINT []
5050
5151# Start vLLM. The qr-sampler plugin is auto-discovered via entry points.
5252# Shell form so environment variables are resolved at runtime.
53- CMD vllm serve ${HF_MODEL} --host 0.0.0.0 --port 8000 --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
53+ CMD vllm serve ${HF_MODEL} --host 0.0.0.0 --port 8000 --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
You can’t perform that action at this time.
0 commit comments