fix: eliminate 60-second delay for OpenRouter Pipe requests (issue #378) by cwawak · Pull Request #384 · cogwheel0/conduit

cwawak · 2026-03-02T19:01:38Z

Summary

Fixes the ~60 second OpenRouter Pipe delay in Conduit and hardens SSE-only
streaming to restore usage/cost info and prevent duplicate output.

Root Cause

OpenWebUI routes /api/chat/completions through an async task queue when
session_id, chat_id, and id (message_id) are all present, adding ~60
seconds of latency. Conduit was sending all three; the web UI does not.

Fix

Remove session_id, id, and chat_id from the request payload to bypass
the async queue and force direct SSE streaming.
Parse OpenWebUI SSE formats (event payloads + reasoning content).
Consume SSE usage markers so the info/cost button is restored.
Preserve whitespace chunks so tokens don’t concatenate and “Thinking…”
content appears reliably.
Skip channel socket subscriptions in SSE-only mode to avoid duplicate
content from both HTTP and sockets.

Changes

lib/core/services/api_service.dart
- Remove identifiers from payload; add SSE parsing (usage + reasoning).
lib/core/services/streaming_helper.dart
- SSE usage marker handling, whitespace preservation, and skip channel
  socket subscriptions when httpStreamOnly.
lib/features/chat/providers/chat_providers.dart
- Pass httpStreamOnly to streaming helper.
lib/features/chat/widgets/assistant_message_widget.dart
- Usage/cost display logic for info button.

Testing

HTTP request completes in ~2–3 seconds instead of ~60 seconds.
SSE response streams properly with reasoning and usage/cost info.
No duplicate content in SSE-only mode.

cwawak · 2026-03-02T21:41:21Z

Root cause → Fix → Results (please note: I’m not an expert here, just doing my best to help)

Root cause

OpenWebUI routes /api/chat/completions through a Redis async task queue when all three identifiers are present: session_id, chat_id, and id (message_id). That queue adds ~60s latency. Conduit was sending all three; the web UI does not.

Fix

Stop sending session_id, chat_id, and id in the Conduit request to avoid the async queue and force direct SSE streaming.
Update SSE parsing to handle OpenWebUI event format + reasoning blocks.
Consume SSE usage markers so the info/cost button works.
Preserve whitespace chunks and skip channel socket subscriptions in SSE-only mode to avoid concatenation/duplication.

Results

✅ 60-second delay eliminated (responses start in ~2–3s).
✅ SSE streaming works with reasoning + usage/cost info.
✅ No duplicated output in SSE-only mode (channel socket unsubscribed).
✅ Spacing/“Thinking…” content no longer drops when chunks are whitespace-only.

- Remove session_id, id, and chat_id from request payload to bypass OpenWebUI's async task queue (issue cogwheel0#378) - Add SSE parsing to handle streaming response directly via HTTP - Add isHttpStreamOnly flag to prevent duplicate content from WebSocket - When using SSE-only mode, skip WebSocket subscriptions entirely The 60-second delay was caused by OpenWebUI routing requests through an async task queue when session_id, chat_id, and message_id (id) were all present. By removing these identifiers, requests go directly through SSE streaming instead.

…th cost display and reasoning content

…for cogwheel0#378

cwawak · 2026-03-02T22:35:43Z

Quick update: I added SSE replay‑dedupe + whitespace handling for SSE‑only mode.

Skip replayed SSE chunks when the server re‑sends already‑seen content.
Always append whitespace‑only chunks so spaces between tokens aren’t dropped.

This resolved the duplicated responses and the “Thesky” first‑word spacing issue in my testing.

cwawak · 2026-03-02T23:16:53Z

Update: I rebased the PR branch onto the refactor version.

What changed in the PR branch:

fix/direct-flag-pipe-model-delay now points to the refactor/sse-normalize history.
SSE content is normalized in api_service.dart into deltas before the UI sees it.
UI stream handler (streaming_helper.dart) no longer needs replay/duplicate heuristics.
Mixed SSE formats are handled by skipping OpenAI deltas once SSE event content is seen.

Result: no duplicated responses and first‑word spacing issues resolved in my tests.

cogwheel0 · 2026-03-03T05:10:42Z

Hey @cwawak, thank you for the PR! Streaming via SSE has it's own problems and is not very reliable for mobile clients. I struggled to balance and keep it as a path in the initial days of the app. Duplication is just one of issues you may have come across. OWUI web also primarily relies on websockets.

Anyways, could you try this and let me know if it alleviates the issue without relying on SSE?: https://docs.openwebui.com/troubleshooting/connection-error#websocket-troubleshooting

cwawak · 2026-03-03T17:30:01Z

Hi @cogwheel0 - thanks for being so kind. I don't think my patch is usable, but maybe helpful for someone who is troubleshooting this!

I have verified that websockets are working fine. Using Chrome on my mac, I do not have any issues with Websockets. Using Conduit to Anthropic hosted or NIM hosted models works fine with the latest couple versions of the app.

I only encounter the 30s+30s=60s total delay when using models that are deployed using the "Open WebUI OpenRouter Pipe". This pipe allows for easy toggle of "OpenRouter search" (where OpenRouter, for a few pennies, inserts search results into the model response), some enforcement of Zero Data Retention flags per model, and a nice little display of tokens consumed and total cost.

In the Web UI, I have no problems with WS or delays, only via the Conduit app. I don't necessarily think this is something that should be fixed in Conduit, but I couldn't figure out what was wrong with the OpenRouter Pipe itself.

Anyway, I greatly appreciate your kind words, I think your application is positively delightful and very easy to work on for a novice coder!

cogwheel0 · 2026-03-03T17:48:08Z

Hi @cogwheel0 - thanks for being so kind. I don't think my patch is usable, but maybe helpful for someone who is troubleshooting this!

I have verified that websockets are working fine. Using Chrome on my mac, I do not have any issues with Websockets. Using Conduit to Anthropic hosted or NIM hosted models works fine with the latest couple versions of the app.

I only encounter the 30s+30s=60s total delay when using models that are deployed using the "Open WebUI OpenRouter Pipe". This pipe allows for easy toggle of "OpenRouter search" (where OpenRouter, for a few pennies, inserts search results into the model response), some enforcement of Zero Data Retention flags per model, and a nice little display of tokens consumed and total cost.

In the Web UI, I have no problems with WS or delays, only via the Conduit app. I don't necessarily think this is something that should be fixed in Conduit, but I couldn't figure out what was wrong with the OpenRouter Pipe itself.

Anyway, I greatly appreciate your kind words, I think your application is positively delightful and very easy to work on for a novice coder!

Ah I see! If you are just looking to use the websearch for openrouter models, might I suggest another solution? If you put :online after any model in the OWUI connections with a openrouter endpoint, it would work just as well. It even returns the sources along with number of tokens consumed in the newer versions.

I haven't tried out the pipe you linked so I might be missing something as well.

cwawak · 2026-03-03T18:36:37Z

I'm too cheap to actually pay the $0.02 per request for web search, so it's not actually a feature I care about! :D

I think you're right the best solution is to simply use the normal OpenAI model flow rather than a custom pipe, especially if there's some strange issues that are taking days to figure out. I thought this would be a 15 minute fix.

The only thing I'd be missing, is I don't think the native OpenAI model API has a way to display the cost and token consumption like in this picture.

Thank you so much for your assistance!

swever826 · 2026-03-03T19:59:59Z

The only thing I'd be missing, is I don't think the native OpenAI model API has a way to display the cost and token consumption like in this picture.

Hi @cwawak. Sorry just wanted to share that to see the consumption in real time I am using LiteLLM if you want to give it a try. Its open-source and can be run locally.

It works like a proxy so all the LLMs you use can be used from 1 place and it has a lot of cool functionalities. One of them is that it shows in real time the cost and token consumption of each LLM.

TeenBiscuits · 2026-03-08T18:34:59Z

I also use OpenRouter Pipe; it's very useful for integrating OpenRouter features into Open Web UI, but latency is a major issue. Is there a solution?

cwawak added 4 commits March 2, 2026 16:43

WIP: Partial fixes for cogwheel0#378 - OpenRouter pipe integration wi…

2e0dd97

…th cost display and reasoning content

fix: prevent conversation refresh duplication in httpStreamOnly mode …

d41b61c

…for cogwheel0#378

fix: capture SSE usage markers and avoid duplicate streaming

f57d791

cwawak force-pushed the fix/direct-flag-pipe-model-delay branch from 9c69496 to f57d791 Compare March 2, 2026 21:46

cwawak added 7 commits March 2, 2026 17:19

fix: avoid duplicate SSE chunks when event content present

d181a97

fix: emit only delta for cumulative SSE event content

62e6759

fix: skip duplicate SSE chunks in httpStreamOnly

8fe573b

chore: log SSE chunk append/skip decisions

9c75ef1

fix: skip replayed SSE chunks in httpStreamOnly

84f31bf

fix: always append whitespace SSE chunks

09ee762

chore: remove SSE debug logging

d623721

cwawak added 4 commits March 2, 2026 17:55

refactor: normalize SSE content before streaming

b0f984e

refactor: normalize SSE deltas against emitted buffer

51ae5dc

fix: insert missing spaces in normalized SSE deltas

bd50dfa

fix: skip OpenAI deltas after SSE event content

9bf7944

cwawak mentioned this pull request Mar 2, 2026

[BUG] OpenRouter pipe routing only after delay #378

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: eliminate 60-second delay for OpenRouter Pipe requests (issue #378)#384

fix: eliminate 60-second delay for OpenRouter Pipe requests (issue #378)#384
cwawak wants to merge 15 commits intocogwheel0:mainfrom
cwawak:fix/direct-flag-pipe-model-delay

cwawak commented Mar 2, 2026 •

edited

Loading

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cogwheel0 commented Mar 3, 2026

Uh oh!

cwawak commented Mar 3, 2026

Uh oh!

cogwheel0 commented Mar 3, 2026

Uh oh!

cwawak commented Mar 3, 2026

Uh oh!

swever826 commented Mar 3, 2026

Uh oh!

TeenBiscuits commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

cwawak commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Changes

Testing

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cwawak commented Mar 2, 2026

Uh oh!

cogwheel0 commented Mar 3, 2026

Uh oh!

cwawak commented Mar 3, 2026

Uh oh!

cogwheel0 commented Mar 3, 2026

Uh oh!

cwawak commented Mar 3, 2026

Uh oh!

swever826 commented Mar 3, 2026

Uh oh!

TeenBiscuits commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cwawak commented Mar 2, 2026 •

edited

Loading