@@ -108,15 +108,15 @@ See [deployments/README.md](deployments/README.md) for the full guide.
108108pip install qr-sampler[grpc]
109109
110110# Start vLLM — qr-sampler registers automatically via entry points
111- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --gpu-memory-utilization 0.90
111+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096
112112```
113113
114114Configure the entropy source via environment variables:
115115
116116``` bash
117117export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
118118export QR_GRPC_SERVER_ADDRESS=localhost:50051
119- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --gpu-memory-utilization 0.90
119+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096
120120```
121121
122122### System entropy fallback
@@ -125,7 +125,7 @@ Without an external entropy source, qr-sampler falls back to `os.urandom()`. Thi
125125
126126``` bash
127127pip install qr-sampler
128- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --gpu-memory-utilization 0.90
128+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096
129129```
130130
131131### Per-request parameter overrides
@@ -430,7 +430,7 @@ Or configure directly via environment variables (bare-metal):
430430``` bash
431431export QR_ENTROPY_SOURCE_TYPE=quantum_grpc
432432export QR_GRPC_SERVER_ADDRESS=localhost:50051
433- vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --gpu-memory-utilization 0.90
433+ vllm serve Qwen/Qwen2.5-1.5B-Instruct --dtype half --max-model-len 8096
434434```
435435
436436The template handles all gRPC boilerplate (unary + bidirectional streaming, health checks, graceful shutdown). You only write the hardware-specific code.
0 commit comments