feat: Support overriding max concurrent Flux streams #121

bnogas · 2025-12-17T21:06:40Z

Proposed changes

Summary

While running Flux on a 3/7 MIG partition of our H200 GPUs, we observed that execution was limited to 5 streams per engine instance. To address this, I added an override to the default configuration to allow higher concurrency.

2025-12-16T20:56:51.380385855Z  WARN impeller::charmer::lib: Setting max-streams=5 because we couldn't read GPU RAM max_streams=5 error=the current user does not have permission to perform this operation

Notes

It’s currently unclear how Flux determines available memory. This limitation may be related to differences in the MIG-specific API or another underlying issue, and may require further investigation.

Types of changes

What types of changes does your code introduce to the Deepgram self-hosted resources?
Put an x in the boxes that apply

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING doc
I have tested my changes in my local self-hosted environment
- by running this version of the helm chart and stress testing flux
I have added necessary documentation (if appropriate)

Further comments

jkroll-deepgram · 2025-12-17T22:18:41Z

@bnogas Deepgram doesn't officially support MIG partitions, and it looks like the underlying issue here is that your GPU isn't being detected, so Deepgram is falling back to our low CPU default of 5 streams.

Are you getting better performance out of raising the max_streams?

If you check your Engine logs, on startup are you seeing a log like INFO impeller::config: Using devices: Gpu(0) (Using GPU), or rather, is it using CPU?

bnogas · 2026-01-26T23:18:49Z

@jkroll-deepgram It uses GPU
kubectl logs -n core deepgram-engine-bd9768cc-pvrvr |grep -i Gpu

2026-01-26T23:13:47.516123726Z  INFO impeller::config: Using devices: Gpu(0)
2026-01-26T23:13:53.507403716Z  WARN impeller: Unable to obtain GPU maximum memory! err=NoPermission gpu_id=0 gpu_name="NVIDIA H100 80GB HBM3"
2026-01-26T23:13:53.507413208Z  INFO impeller: Setting GPU model cache size based on auto lookup table. gpu_id=Gpu(0) gpu_name="Unknown" gpu_memory_size=0 gpu_cache_size=2

I believe there is a difference in API call to get gpu_memory_size when MIG is enabled

Are you getting better performance out of raising the max_streams?

Yes, we have stress tested up to 100 streams on a single engine with 3/7 MIG partition of H100.

The other PR adds support for MIG partitions

feat: Support overriding max concurrent Flux streams

2fb2973

bnogas requested review from a team and therealevanhenry as code owners December 17, 2025 21:06

Merge branch 'main' into feature/allow-setting-max-streams-parameter

66fb7de

Merge branch 'main' into feature/allow-setting-max-streams-parameter

79d5e81

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support overriding max concurrent Flux streams #121

feat: Support overriding max concurrent Flux streams #121

Uh oh!

bnogas commented Dec 17, 2025

Uh oh!

jkroll-deepgram commented Dec 17, 2025

Uh oh!

bnogas commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Support overriding max concurrent Flux streams #121

Are you sure you want to change the base?

feat: Support overriding max concurrent Flux streams #121

Uh oh!

Conversation

bnogas commented Dec 17, 2025

Proposed changes

Types of changes

Checklist

Further comments

Uh oh!

jkroll-deepgram commented Dec 17, 2025

Uh oh!

bnogas commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bnogas commented Jan 26, 2026 •

edited

Loading