Skip to content

Conversation

@smallketchup82
Copy link
Contributor

@smallketchup82 smallketchup82 commented Nov 10, 2025

Prerequisites:

Resources:

  • Original Discord Message
  • Relevant NVIDIA Reflex documentation can be found here by downloading the Reflex SDK and opening NVIDIA Reflex SDK Integration Guide.pdf. Due to the copyright license attached to the SDK I can't provide a direct link to the implementation docs.

Introduction

NVIDIA Reflex is an API for Windows + NVIDIA GPU's that allows the game to effectively report its internal render loop timings to the GPU, allowing it to sync the render queue with the CPU, enabling just-in-time rendering, which results in lower input-to-image latency, improved frame consistency, and reduced stuttering. Additionally, Reflex allows us to gather detailed analytics on the game's render latency, paving the way for future optimization, both in terms of game performance, and for end users attempting to lower system latency.

This video does a good job of explaining how Reflex works.

Implementation & Notes

Frame Limiting From the NVIDIA Reflex Docs:

[...] The benefit of allowing the driver to be aware of the framerate limit that the application wishes to enforce (i.e. at a menu or load screen) is that the driver can compare it against other limits that may come from different components of the driver itself (low power mode for laptops, etc.). The driver can take the lowest limit and enforce it at the best place in the pipeline for ensuring the lowest latency, which is the NvAPI_D3D_Sleep call at the start of the main / simulation thread execution. This function will account for the limit that was passed into NvAPI_D3D_SetSleepMode as well as the appropriate timing for Reflex Low Latency mode (if enabled), and there is no need for the application to perform any additional Sleep on its own.

If you do not want any limit enforced, this minimumIntervalUs value should be 0. The requested framerate limit interval combined with the execution of the NvAPI_D3D_Sleep function can be used whether or not you have the Retlex Low Latency mode enabled. They are orthogonal, and you can safely use NvAPI_D3D_Sleep to replace an existing framerate limit Sleep implementation in your engine.

What this means is that we can use NVIDIA Reflex to limit FPS instead of limiting FPS ourselves on NVIDIA + Windows systems. This allows the GPU driver to automatically apply the lowest FPS limit. Say the user sets an FPS limit in their NVIDIA settings for when their PC is running on battery, offloading frame limiting to the GPU driver allows the game to respect that FPS limit set by the user. This also includes if the user set a limit for when the game is out of focus, or if the user set a global frame limit in their settings. And importantly, it lets the GPU driver automatically cap the FPS to just under the user's refresh rate if they're using GSYNC. Functioning much like an "Optimal" or otherwise intelligent FPS limiting mode.

Also, speaking of capping FPS to just under the user's refresh rate, a goal of mine when it comes to this implementation is to completely disable our built-in FPS limiting system on NVIDIA + Windows systems. It has been stated that going forwards, the goal is to lock FPS to the monitor refresh rate with ideally no impact on latency. This feature of NVIDIA Reflex should hopefully achieve that.

As a result, this PR will try to move that vision forwards by completely disabling the FPS limit setting when NVIDIA Reflex is detected to be on and enabled. When Reflex is on, the game will limit its Draw FPS to the monitors refresh rate. With Reflex off, the game will allow modification of the FPS limit (to what is currently available), but still attempt to use the GPU driver to set that limit, rather than our own fps limiting logic.

Note: For any concerned players, Lazer already does this on MacOS (Metal renderer) and achieves low latency despite it. More frames does not mean better performance.


Update: After some testing on my part, and discussion on the topic, using NVIDIA Reflex's built in frame limiter would introduce audio latency as a result of limiting the refresh rate of the Update thread in addition to the Draw thread. This is sub-optimal, and is good reason to not offload frame limiting to Reflex.

The most likely direction with this feature going forwards is:

  • Not using Reflex's FPS limiter, forcing the Update thread to run at 1000hz for audio & input responsiveness
  • Using the built-in limiter to limit the Draw thread to refresh rate, as Reflex's just-in-time rendering allows this to be done without impacting latency
Markers & Cross Platform Behaviour From the NVIDIA Reflex Docs:

Reflex Latency and PC Latency markers help to measure the time taken by each section of your app. These measurements are independent of whether or not Reflex Low Latency Mode (or Reflex Low Latency Boost Mode) is enabled. As a result, they can be used to measure and test the savings in latency that you get from using Reflex Low Latency Mode (or Reflex Low Latency Boost Mode). And the application must always call in to these markers, regardless of whether Reflex Low Latency Mode is enabled or disabled.

The last sentence is heavily significant as continuing to report the markers even if Reflex is off enables the functionality described in the Front-End Render Latency Telemetry section below.

As for cross-platform behaviour, there is a rather clever way to ensure that the markers are only reported if the game is running on Windows. We initialize and set the default LowLatencyProvider in GameHost as a No-Op implementation. Then, in osu.Desktop, only if NVAPI is available, do we switch the LowLatencyProvider to the NVAPILowLatencyProvider. This basically means that the marker code does absolutely nothing on non-NVIDIA and non-Windows environments, and therefore has no overhead.

As a small side note, if lazer's Vulkan API is brought back from deprecation, NVIDIA Reflex can (theoretically) be made to work on Linux by leveraging the Reflex Vulkan API.

Boost Mode Boost Mode is an NVIDIA Reflex feature which aims to optimize the game in CPU-bound scenarios.

When a game is CPU-bound, the GPU will automatically lower its clocks to save on power consumption. This is typically fine, but it becomes an issue when the game transitions to being GPU-bound. The GPU now has to increase its clocks to keep up with the CPU, which causes stutters while the GPU spools up. Reflex's Boost Mode aims to fix this by constantly running the GPU at the maximum possible clock. This comes at the cost of significantly higher power consumption & possibly more latency (due to the overhead of running the GPU at max performance), but usually less stuttering as the GPU is always ready and in a high-performance state.

In this implementation, I added the option to use Boost as is recommended by the NVIDIA Reflex SDK Docs, but I added a warning about power draw and potential for it to actually backfire and reduce performance.

Render Latency Telemetry

Front-End

Reflex, even in its Off mode, will continue to set markers and gather telemetry on latency. This allows players to use the NVIDIA Overlay to measure render latency, which is objectively a better metric for players to obsess about rather than the current frametime metric in the "Show FPS" panel. Aside from the obvious benefit of transparency, this allows players to measure total system latency if they have a compatible 360hz GSYNC monitor.

Back-End

The Reflex latency telemetry is highly accurate, and according to NVIDIA, completely replaces the need for a high speed camera to measure latency. This can be useful for internal analysis for the core team, and can help shed light on areas of the game that drive up latency, and might require optimization.

Testing

Our own methodological testing can be found here

Warning

Please backup your osu!(lazer) data and read the contributing docs before doing this. Running lazer in the Release configuration can brick your typical lazer installation, but is necessary for a change like this.

  1. Clone my osu reflex branch
  2. Clone my osu-framework reflex branch
  3. Ensure the osu and osu-framework folders are under the same parent directory
  4. Run the UseLocalFramework.(ps1|sh) script in osu according to your system
  5. Run dotnet run osu.Desktop -c Release in osu
  6. Enable Reflex in the in-game settings under the Graphics section. Ensure you are using the DirectX renderer

To find the render latency metric, you can either use the NVIDIA Overlay that comes with the NVIDIA app or GeForce Experience app, or you can use the Reflex Testing HUD found in the Reflex SDK linked above.

What's Left?

  • Limit Draw FPS to refresh rate
    • Allow using VSYNC
  • Mark PR as ready for review
  • Final touches (improving logging, cleaning up comments, renaming symbols, etc.)
  • Turn Reflex on by default
  • (Optional) Adding in methods for programmatically grabbing render latency & other debugging stats
    • (Optional) Replace current frametime statistic in "Show FPS" panel with the current render latency

AMD has a Reflex-like SDK for reducing latency on AMD cards. This SDK is called AMD Radeon Anti-Lag 2. While I would like to implement this beside Reflex (in a different PR), I don't own an AMD GPU to be able to test it out and make sure it works.

@Spok5508
Copy link

Spok5508 commented Nov 11, 2025

Studying the effects on latency

As I've invested time and money into my Reflex Latency Analyzer setup, I'm hoping I can help out!

Testing method:

Tests conducted on your reflex branch for osu and framework.

  • Reflex Latency Analyzer (MSI Oculux NXG253R 360hz, Razer Viper Ultimate)
  • Latency flash = hit300/hit100/hit50 etc skinned as fullscreen white flash
  • every test has a sample size of 300 inputs

ALL tests have pre-rendered frames set to 1 (nvidia low latency mode in driver)

Expected results:

  • Minor impact on latency in overdrawing scenario's (e.g. using lazer's 2x limiter)
  • Major impact on latency in gsync scenario's

Findings:

Tools used for analysis: https://eskezje.github.io/Frametime-Analysis/

D3D11 in-game 2x limiter, multithreading, Reflex OFF/ON/BOOST

Latency:

{05ED76BD-C609-4099-9FB3-928FC56D1C96}

As expected there is a minor decrease in latency when using the Reflex Boost option.

Frame pacing:

{C9881120-6D18-4221-A127-BC29031B0025} {6F479A5D-D1F6-4558-9044-C93868AB070A}

(Dataset A and B are Reflex Off and On respectively)

Pairing the in-game frame limiter with Reflex seems to promote increased consistency for frame pacing.

D3D11 Multithreading GSYNC Scenario # 1: Naive setup ("enable vsync+gsync and call it a day")

People love giving this advice for some reason, even though it's completely wrong.

Latency:

{85DEB9E2-6240-43CE-934B-DD08DF04A5AA}

When ignoring driver settings and just enabling vsync in-game, gsync is not doing anything to help reduce latency.

Framepacing results are ignored for now, as it's expected of vsync to behave well in this case anyways.

D3D11 Multithreading GSYNC Scenario # 2: "Optimal" setup (Driver level vsync and Ultra Low Latency Mode)

In this scenario, NVIDIA limits our fps to a lower rate to support low latency vsync (327fps in my case)

{4DBACE00-1F69-46A7-9648-4E298FCCE422}

It's quite apparent that gsync is only worth using when set up properly.

This scenario is how players have to set up gsync currently in osu!lazer, it's test results serves the purpose of comparison to our new Reflex behaviour.

D3D11 Multithreading GSYNC Scenario # 3: Reflex setup

Latency:

{06022B4C-DB61-4DD8-AF77-8DE57F36A0C5}

Behaviour is as expected, players don't have to mess with any driver settings to engage low latency gsync when enabling the in-game Reflex option.

Boost seems to support a slightly lower latency average with an increase in deviation.

Frame pacing:

{97DAD2D4-267A-42C5-9B7B-C84048C6D2F1}

Reflex Boost seems to support more consistent frame pacing even if we are already vsync'd.

Conclusion:

Reflex seems incredibly useful to guide players into using gsync correctly, without having them mess with driver level changes which they might not understand.

@smallketchup82
Copy link
Contributor Author

smallketchup82 commented Nov 11, 2025

@Spok5508 Thank you so much for the tests!!! Especially the GSYNC ones as I hadn't even considered Reflex's impact on GSYNC

First off I have a question:

When testing, did you run the game in the Release configuration? If not then the test data is likely going to be inaccurate due to the overhead involved with the Debug configuration. I forgot to mention this in the OP.

Questions aside, I did do some basic testing of my own using CapFrameX. My testing focused on frametime behaviour rather than latency. I found that overall, reflex traded frame pacing and consistency for frame stability. Meaning reflex reduced worst-case stutters, but introduced slightly more micro stutters as a result of its overhead.

I'm currently on the subway so I'm not able to pull up my charts but I'll send them when I can.

@Spok5508
Copy link

When testing, did you run the game in the Release configuration?

Yes! I forgot to mention this as well.

@smallketchup82
Copy link
Contributor Author

smallketchup82 commented Nov 13, 2025

A Report of Reflex's Effects on osu!(lazer)

Testing specs:

  • Ryzen 5 5600X
  • RTX 3060
  • Windows 11
  • NVIDIA driver version 581.80
  • 240hz monitor with freesync/gsync off
  • Software used: CapFrameX
  • Built with release build configuration

Glossary:

  • Render Latency = Time it takes for the game to render 1 frame
  • System Latency = Across the entire system, including monitor, game render latency, mouse input, how long it takes for 1 frame to be rendered and displayed on the screen

TLDR

NVIDIA Reflex improves frame consistency by 6.27% in the "On" mode, making the game feel smoother, at the expense of a 0.42% increase in system latency. Reflex reduces system latency by 2.92% in the "Boost" mode, making the game feel more responsive, at the expense of a 4.41% reduction in frame consistency, meaning more stutters. Overall a net positive, with most benefits in the "On" mode. (Percentages sourced from @Spok5508)

My Thoughts on Reflex and Latency

While the effects of Reflex are going to vary from machine to machine, I believe that for osu! in specific, a game that is pretty much entirely CPU bottlenecked, realistically the perceivable effects of Reflex are going to be slim, if not imperceptible for most, when looking at render latency alone.

After playing the game with Reflex (boost) and the Reflex testing overlay on, I noticed an average decrease of 500 microseconds (0.5ms) on render latency, a difference of a couple hundred microseconds compared to Reflex (off). Going off of render latency alone, I doubt that difference will prove to accomplish much.

When including the testing done by @Spok5508 on system latency across the reflex modes without gsync, we don't see a sizable change in system latency across the different modes, we're still looking at a difference of hundreds of microseconds, and single digits in percentages.

The Reflex (boost) mode has the lowest system latency reduction at roughly 265 microseconds (0.265ms), a 2.92% decrease. But this is STATISTICALLY better than Reflex (on), which adds 38 microseconds (0.038ms) of latency, a 0.42% increase in system latency.

However, based on humanity's current understanding of human physiology and psychology, there is no plausible way an increase of 38 microseconds in latency should be at all perceivable to any human being. This number is so inconceivably low that I would go far as to say it is completely impossible to detect.

The decrease of 265 microseconds from Reflex (boost) adds a bit more plausibility, but the number is still pretty low to where I would only call it marginally better than the baseline Reflex (off). Regardless I would not say the latency decrease in Reflex (boost) is beneficial considering the mode's decrease of frame consistency (4.41%) and overall FPS (~10fps).

Overall: Reflex doesn't really change much in terms of latency.

What Does Reflex Do Then?

Reflex (on) reduces larger frame drops, the ones that would be perceivable. Going off of Spok's tests, we see a 6.27% reduction in standard deviation from the Reflex (on) mode, roughly meaning a 6.27% reduction in stuttering compared to Reflex (off). My own testing also confirms this:

image

This chart shows the frametime variability across the 3 different reflex states. Top is off, middle is on, bottom is boost. Reflex (on) reduces the frames within the red section (stutters), but in turn has more frames in the orange section (micro stutters). Reflex (boost) has more variability and more stutters overall. This confirms Spok's findings, and confirms Reflex's improvements in frame pacing and consistency.

image

This chart shows the FPS differences with the different Reflex modes. First off we observe a gradual reduction in FPS as we go through the 3 modes, consistent with their presumable overhead. The 1% percentile is higher in the Reflex (on) state which tells us that Reflex (on) reduced dropped and uneven frames, overall improving worst case stutters. Reflex (boost) is slightly worse than Reflex (on) when it comes to reducing dropped and uneven frames, but is seemingly still better than Reflex (off), likely because the Reflex (boost) mode has the most reductions to system latency, which is better than stuttering on high latency.

Overall: Reflex makes frames more consistent, reduces dropped frames, and reduces stuttering.

Overview

Overall it can be said that Reflex DOES improve the game's performance. Despite it's overhead, we observe improvements in frame consistency, frame pacing, and reduced stuttering. This comes at the cost of possibly more micro stutters (however these are not likely to be noticeable), and reduced FPS. A decent trade off.

I want to make the case that even though Reflex slightly increases latency (unless with the Boost mode, in which it slightly reduces latency), the improved frame consistency alone should assist players. Better frame consistency allows players to accurately predict when new frames will be presented. Players should find it easier to predict when approach circles will meet the circles, and I would additionally expect improvements in Unstable Rate given the improvement in frame pacing, allowing the game to feel more smooth and less jittery.

Big thanks to @Spok5508 for the system latency testing which I'm incapable of testing myself. Reflex is pretty promising and I hope it finds its way into players hands, increasing their performance and their love for the game.

@Susko3
Copy link
Member

Susko3 commented Nov 13, 2025

I assume most of this stuff is targeted at click to display latency. How does this affect click to audio latency? I would expect it to not change, but depending on how the syncing of the update and render threads and the GPU is implemented, it might increase audio latency.

the goal is to lock FPS to the monitor refresh rate with ideally no impact on latency

If I have a 60 Hz monitor and use Reflex, would this entail locking the update and draw threads at 60 FPS? If that's the case, it's really bad for audio latency: I click my mouse, and instead of the update thread quickly responding by playing a hitsound, it's waiting for some rendering synchronisation mechanism.

I think discussion about the impact of Reflex on audio latency is missing in your conversation.

@smallketchup82
Copy link
Contributor Author

smallketchup82 commented Nov 13, 2025

I assume most of this stuff is targeted at click to display latency. How does this affect click to audio latency? I would expect it to not change, but depending on how the syncing of the update and render threads and the GPU is implemented, it might increase audio latency.
[...]
I think discussion about the impact of Reflex on audio latency is missing in your conversation.

No, Reflex should never directly touch audio whatsoever. The main goal of Reflex is to effectively sync the render submission phase, which is performed by the CPU, so that it finishes just in time for the GPU to render. Any audio and input work happen on their own threads, and are left completely untouched.

The only way I could possibly see any impact on audio latency would be purely in the case of Reflex increasing CPU contention or load in a manner where it slows down the audio thread, but this is a massive what-if and highly unlikely to occur in the real world.

I can agree with measuring audio latency in addition to Reflex to test your theory rather than dismiss it without any empirical basis, but I do remain firm in my belief that Reflex should have no measurable impact on audio latency

the goal is to lock FPS to the monitor refresh rate with ideally no impact on latency

If I have a 60 Hz monitor and use Reflex, would this entail locking the update and draw threads at 60 FPS? If that's the case, it's really bad for audio latency: I click my mouse, and instead of the update thread quickly responding by playing a hitsound, it's waiting for some rendering synchronisation mechanism.

No, the only thread being locked would be the draw thread via the FrameSleep method, the update thread should be left untouched at its usual 1000hz.

I will clarify that most of my testing and explanations in this PR have been assuming a multi-threaded context, as I haven't gotten around to testing and understanding Reflex's effects on single-threaded games. I do believe that Reflex can negatively impact game performance on the single-threaded mode. To be honest, when implementing Reflex, I did get the feeling that Reflex was built with games being multi-threaded in mind. I don't think allowing Reflex on single-threaded environments would be beneficial, I'm skeptical at best.

@Susko3
Copy link
Member

Susko3 commented Nov 13, 2025

No, the only thread being locked would be the draw thread via the FrameSleep method, the update thread should be left untouched at its usual 1000hz.

Then why is FrameSleep() called on the update thread? https://github.com/ppy/osu-framework/pull/6666/files#diff-c694d93cea53f76879738dce61918a64bc3110e7df817250864f1f4754607512L467-R470

@smallketchup82
Copy link
Contributor Author

smallketchup82 commented Nov 13, 2025

I explained it incorrectly, allow me to revise:

According to the Reflex Docs:

Now, you'll need to place the NvAPI_D3D_Sleep function call at the start of the main / simulation thread's per-frame loop... before any mouse or keyboard input is gathered and used to simulate a new frame. The proper location for this may have already been identified for the sake of an existing framerate-limiting Sleep in your engine, so the NvAPI_D3D_Sleep would just replace that as long as it occurs before input is sampled (this is worth double checking).

The goal here is to have the freshest input data possible for any rendered frame that eventually hits the screen. You don't want to have backpressure in the middle of the pipeline making previously sampled input data and the resulting frame simulation unnecessarily stale. You want the buffers of work that make up the frame to arrive into the GPU render queue "just in time" to be executed by the GPU.

I placed FrameSleep at the top of the updateFrame method as I believe the Root.UpdateSubTree(); method is what goes ahead and captures input (now if I'm wrong about this, PLEASE correct me, it would be a big issue). As the Reflex docs state, the placement of FrameSleep must be at the top of the main/simulation thread's frame loop, and before any input is collected. Therefore my placement is correct here.

To put things another way, you fear this to be the pipeline: [Click -> Update Thread Tries to Play Audio -> BLOCKED by Render Sync -> Long Delay -> Hitsound Plays]

When in reality it should look more like this: [Sleep/Wait -> Poll Input (Click) -> Update Thread Plays Audio (No Delay) -> Simulation-for-Render Starts]

What I was attempting to convey in the original message is that no, Reflex does not lock the Update thread mid-input. It delays the start of the frame so that when input is sampled, it is as fresh as possible (as close to the rendering time as possible). I was definitely incorrect in saying it only locks the Draw thread, my bad 😅

As for the placement of FrameSleep, you're right in observing that it doesn't lock the Draw thread. But this isn't an entirely bad thing. If we use Reflex to pace the Update thread to say, 60 FPS, the Draw thread will still be waiting on the Update thread (via drawRoots.GetForRead), so it effectively also gets limited down to 60 FPS, even if it's running at 1000hz. Whether this is desirable is up for discussion, but quite frankly this is what the NVIDIA Reflex docs want us to do if we decide to do frame limiting.

@Susko3
Copy link
Member

Susko3 commented Nov 14, 2025

If we use Reflex to pace the Update thread to say, 60 FPS, [...]. Whether this is desirable if up for discussion, but quite frankly this is what the NVIDIA Reflex docs want us to do.

This will cause up to a 16.67 ms delay in processing input, judgements and hitsound playback (compare with 4.17 ms when the update thread is running at the default 240 Hz on a 60 Hz display). I would say that this is a noticeable difference and undesirable in a rhythm game like osu!. It's a trade-off between input to display and input to audio (+judgement) latency.

To avoid this problem, I propose not calling NvAPI_D3D_Sleep/FrameSleep. This is supported, avoids unnecessary audio latency, and still helps with display latency.

NVIDIA Reflex SDK Integration Guide.pdf, pp. 14–15

cc @smoogipoo for your opinion

@smallketchup82
Copy link
Contributor Author

smallketchup82 commented Nov 14, 2025

To avoid this problem, I propose not calling NvAPI_D3D_Sleep/FrameSleep. This is supported, avoids unnecessary audio latency, and still helps with display latency.

I personally believe this is overkill. While the 16.67ms delay would exist if we locked both the update thread and draw thread to 60 FPS, we don't have to lock the update thread to be that low. We don't have to have the GPU driver touch framerate at all. I don't believe reducing the effectiveness of Reflex across the board is justifiable just to be able to include what was meant to be a cool side-project stemming from this feature.

I propose we scale down the goals of what this PR is trying to achieve for now. Instead of locking the Update & Draw threads to users refresh rates, we can instead keep FrameSleep and use it as a Reflex-exclusive replacement to our own built-in frame limiting logic. This allows the Update thread to continue running at its typical x2 speed of the Draw thread, and doesn't condemn the user to large amounts of delay for having a low refresh rate monitor. There's also the option of just not having Reflex limit FPS at all, leaving FrameSleep to serve one and only purpose which is to improve frame consistency no matter the FPS. Solving your audio latency concerns while allowing Reflex to perform at its fullest capacity.

@smallketchup82
Copy link
Contributor Author

I've gone through and fixed some issues mentioned. I decided that to push this PR forwards, I won't add in FPS limiting via Reflex. Instead, I used our built in FPS limiter to limit the Draw thread to monitor refresh rate, as anything over that is unnecessary given reflex's just-in-time rendering. I've edited the "FPS Limiting" section in the OP to reflect this. This PR should now be ready for review

@smallketchup82 smallketchup82 marked this pull request as ready for review November 20, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants