You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
qr-sampler works on Apple Silicon via [vllm-metal](https://github.com/vllm-project/vllm-metal), a community-maintained vLLM plugin under the official `vllm-project` GitHub org. It uses MLX under the hood but exposes the same vLLM API and plugin system — same entry points, same endpoints, same `curl` commands.
125
+
126
+
vllm-metal works with MLX-format models from the [mlx-community](https://huggingface.co/mlx-community) collection on Hugging Face. These are pre-converted and quantized for Apple Silicon — pick one that fits your available memory.
127
+
128
+
> **Prerequisite:** vllm-metal currently does not load custom logits processors registered via entry points — it creates an empty `LogitsProcessors()` instead of calling `build_logitsprocs()`. [PR #124](https://github.com/vllm-project/vllm-metal/pull/124) fixes this with a 9-line patch that mirrors `GPUModelRunner`'s pattern. Until it is merged, you will need to apply the patch manually or install from the PR branch. Without it, qr-sampler's plugin will be silently skipped.
This creates a virtual environment at `~/.venv-vllm-metal` with vLLM and all dependencies. Requires Python 3.12+.
137
+
138
+
#### 2. Install qr-sampler
139
+
140
+
```bash
141
+
source~/.venv-vllm-metal/bin/activate
142
+
pip install qr-sampler
143
+
```
144
+
145
+
#### 3. Start the server
146
+
147
+
```bash
148
+
source~/.venv-vllm-metal/bin/activate
149
+
vllm serve mlx-community/Qwen3-0.6B-4bit
150
+
```
151
+
152
+
qr-sampler registers automatically via the same `vllm.logits_processors` entry point — no additional configuration needed. Look for this line in the server logs to confirm the plugin is active:
"messages": [{"role": "user", "content": "Tell me about quantum randomness"}],
176
+
"max_tokens": 100
177
+
}'
178
+
```
179
+
180
+
All configuration (entropy sources, temperature strategies, per-request overrides) works identically to the NVIDIA setup. The only difference is how vLLM itself is installed.
181
+
182
+
> **Note:** The Docker deployment profiles are not compatible with Apple Silicon. Docker on macOS runs a Linux VM with no Metal GPU passthrough, so vllm-metal must run natively. To use Open WebUI on Apple Silicon, see the [Web UI](#web-ui) section.
183
+
122
184
### System entropy fallback
123
185
124
186
Without an external entropy source, qr-sampler falls back to `os.urandom()`. This is useful for development and testing but does not provide the quantum randomness needed for consciousness-research experiments. To use system entropy, set `QR_ENTROPY_SOURCE_TYPE=system` (this is the default).
qr-sampler works with [Open WebUI](https://github.com/open-webui/open-webui), a
151
213
self-hosted ChatGPT-style interface that connects to vLLM's OpenAI-compatible
152
-
API. Every deployment profile includes it as an optional service — add
214
+
API.
215
+
216
+
**NVIDIA / Linux:** Every deployment profile includes Open WebUI as an optional service — add
153
217
`--profile ui` to start it alongside vLLM:
154
218
155
219
```bash
156
220
cd deployments/urandom
157
221
docker compose --profile ui up --build
158
222
```
159
223
160
-
Then open http://localhost:3000 to start chatting. Without `--profile ui`, Open
161
-
WebUI does not start and nothing changes.
224
+
**Apple Silicon:** The deployment profiles use NVIDIA GPU images, but Open WebUI itself is just a web app. Run it standalone in Docker and point it at your vllm-metal server:
0 commit comments