Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
qr_sampler_filter.json	qr_sampler_filter.json
qr_sampler_filter.py	qr_sampler_filter.py

Open WebUI Integration

Open WebUI provides a ChatGPT-style web interface for chatting with models served by vLLM. Every qr-sampler deployment profile includes it as an optional Docker Compose service.

This directory contains a filter function that lets you control qr-sampler parameters (temperature, top-k, top-p, sample count, etc.) directly from the Open WebUI admin panel — no API calls or environment variable changes needed.

Starting Open WebUI

From any deployment profile directory, add --profile ui:

cd deployments/urandom          # or firefly-1, _template, your-profile
cp .env.example .env
docker compose --profile ui up --build

Open http://localhost:3000. Without --profile ui, Open WebUI does not start and the deployment behaves exactly as before.

Installing the filter function

The filter function ships as two files:

File	Purpose
`qr_sampler_filter.py`	Human-readable source code
`qr_sampler_filter.json`	Open WebUI importable JSON

Import steps

Open http://localhost:3000 and log in (first user becomes admin).
Go to Admin Panel > Functions (or Workspace > Functions).
Click Import (the upload icon).
Select qr_sampler_filter.json from this directory.
Toggle the imported function to Global so it applies to all models.

The filter is now active. Every chat message will include qr-sampler parameters in requests sent to vLLM.

Alternative: paste the source

If you prefer not to use the JSON import:

Go to Admin Panel > Functions and click Create a new function.
Set the type to Filter.
Copy the contents of qr_sampler_filter.py into the code editor.
Save and toggle to Global.

Configuring parameters (Valves)

After importing the filter, click the gear icon next to it to open the Valves panel. Each Valve maps to a qr-sampler per-request parameter:

Filter control

Valve	Default	Description
`priority`	`0`	Filter execution order (lower runs first).
`enable_qr_sampling`	`true`	Master switch. Set to `false` to pass requests through unmodified.

Token selection

Valve	Default	Maps to	Description
`top_k`	`50`	`qr_top_k`	Keep only the k most probable tokens (0 disables).
`top_p`	`0.9`	`qr_top_p`	Nucleus sampling threshold (1.0 disables).

Temperature

Valve	Default	Maps to	Description
`temperature_strategy`	`fixed`	`qr_temperature_strategy`	`fixed` or `edt` (entropy-dependent).
`fixed_temperature`	`0.7`	`qr_fixed_temperature`	Constant temperature (fixed strategy).
`edt_base_temp`	`0.8`	`qr_edt_base_temp`	Base coefficient for EDT.
`edt_exponent`	`0.5`	`qr_edt_exponent`	Power-law exponent for EDT.
`edt_min_temp`	`0.1`	`qr_edt_min_temp`	EDT temperature floor.
`edt_max_temp`	`2.0`	`qr_edt_max_temp`	EDT temperature ceiling.

Signal amplification

Valve	Default	Maps to	Description
`signal_amplifier_type`	`zscore_mean`	`qr_signal_amplifier_type`	Amplification algorithm.
`sample_count`	`20480`	`qr_sample_count`	Entropy bytes fetched per token.
`population_mean`	`127.5`	`qr_population_mean`	Null-hypothesis mean for byte values.
`population_std`	`73.612...`	`qr_population_std`	Population std for uniform [0, 255].
`uniform_clamp_epsilon`	`1e-10`	`qr_uniform_clamp_epsilon`	Clamp u to avoid degenerate CDF.

Logging

Valve	Default	Maps to	Description
`log_level`	`summary`	`qr_log_level`	`none`, `summary`, or `full`.
`diagnostic_mode`	`false`	`qr_diagnostic_mode`	Store all token records in memory.

How it works

User types message in Open WebUI
  |
  +-> Open WebUI sends request to vLLM (/v1/chat/completions)
  |
  +-> Filter inlet() runs BEFORE the request reaches vLLM:
  |     - Reads current Valve values
  |     - Adds qr_top_k, qr_top_p, qr_temperature_strategy, etc.
  |       as top-level keys in the request body
  |
  +-> vLLM receives the request:
  |     - Unknown top-level keys become SamplingParams.extra_args
  |     - qr-sampler's resolve_config() reads qr_* from extra_args
  |     - Token sampling uses the parameters from the Valves
  |
  +-> Response streams back through Open WebUI to the user

Infrastructure settings (gRPC server address, fallback mode, etc.) are not exposed as Valves — they cannot change per-request and are controlled by environment variables on the vLLM container.

What is NOT controlled by the filter

The filter only manages per-request sampling parameters. These settings are configured via environment variables in your .env file and apply to all requests:

Entropy source type and gRPC server address
gRPC transport mode, timeout, and retry count
Fallback mode
Circuit breaker thresholds
API key authentication

See the configuration reference in the main README for the full list.

Disabling the filter

To stop injecting qr-sampler parameters without removing the filter:

Open the Valves panel (gear icon).
Set enable_qr_sampling to false.

Requests will pass through to vLLM unmodified, and qr-sampler will use its default configuration from environment variables.

Customizing the UI port

Set OPEN_WEBUI_PORT in your .env file:

OPEN_WEBUI_PORT=8080

Then access Open WebUI at http://localhost:8080.

Authentication

By default, Open WebUI runs without authentication (OPEN_WEBUI_AUTH=false). This is convenient for local development. For shared or public servers, enable authentication:

OPEN_WEBUI_AUTH=true

The first user to sign up becomes the admin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Open WebUI Integration

Starting Open WebUI

Installing the filter function

Import steps

Alternative: paste the source

Configuring parameters (Valves)

Filter control

Token selection

Temperature

Signal amplification

Logging

How it works

What is NOT controlled by the filter

Disabling the filter

Customizing the UI port

Authentication

FilesExpand file tree

open-webui

Directory actions

More options

Directory actions

More options

Latest commit

History

open-webui

Folders and files

parent directory

README.md

Open WebUI Integration

Starting Open WebUI

Installing the filter function

Import steps

Alternative: paste the source

Configuring parameters (Valves)

Filter control

Token selection

Temperature

Signal amplification

Logging

How it works

What is NOT controlled by the filter

Disabling the filter

Customizing the UI port

Authentication