Skip to content

Commit e5d169e

Browse files
committed
fix: correct max-model-len typo 8096 -> 8192 in docs and Dockerfile
1 parent cc73078 commit e5d169e

2 files changed

Lines changed: 4 additions & 4 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,15 +139,15 @@ curl http://localhost:8000/v1/completions \
139139
pip install qr-sampler
140140

141141
# Start vLLM — qr-sampler registers automatically via entry points
142-
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
142+
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
143143
```
144144

145145
Configure the entropy source via environment variables:
146146

147147
```bash
148148
export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
149149
export QR_GRPC_SERVER_ADDRESS=localhost:50051
150-
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
150+
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
151151
```
152152

153153
### Apple Silicon (macOS)
@@ -579,7 +579,7 @@ Or configure directly via environment variables (bare-metal):
579579
```bash
580580
export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
581581
export QR_GRPC_SERVER_ADDRESS=localhost:50051
582-
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
582+
vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80
583583
```
584584

585585
The template handles all gRPC boilerplate (unary + bidirectional streaming, health checks, graceful shutdown). You only write the hardware-specific code.

examples/docker/Dockerfile.vllm

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,4 +50,4 @@ ENTRYPOINT []
5050

5151
# Start vLLM. The qr-sampler plugin is auto-discovered via entry points.
5252
# Shell form so environment variables are resolved at runtime.
53-
CMD vllm serve ${HF_MODEL} --host 0.0.0.0 --port 8000 --dtype half --max-model-len 8096 --gpu-memory-utilization 0.80
53+
CMD vllm serve ${HF_MODEL} --host 0.0.0.0 --port 8000 --dtype half --max-model-len 8192 --gpu-memory-utilization 0.80

0 commit comments

Comments
 (0)