-
Notifications
You must be signed in to change notification settings - Fork 6
Fix: Attempt delayed thread creation as long-term viable f3f lag spike fix #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Attempt delayed thread creation as long-term viable f3f lag spike fix #38
Conversation
|
Yet another note (yikes). I did a bit of benchmarking to properly compare this PR to the cleaner Jojoe one. Unfortunately (because this is jank af) or fortunately (because if this wasn't the case, I'd be very confused), this PR does seem to be pretty significantly faster, building 32rd of chunks in 24-27s on my PC compared to a wildly varying 33-40s on the Jojoe PR. This testing wasn't super extensive (maybe 4-6 tries each) but I didn't bother going too much further because the numbers were very consistent across restarts/different terrain. I think we could probably tune this PR further for F3+F use cases but at this point I've invested way too much time into investigating/tuning this so I'm going to leave this for comment. Let me know if you have a better idea or approach. |
| this.world = world; | ||
|
|
||
| this.builder = new ChunkBuilder<>(backend.getVertexType(), this.backend); | ||
| this.builder.SetupWithRender(renderDistance); // Split out for compat... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is split out because other mods love injecting into Sodium. Ideally we would just pass this into the constructor.
| this.backend.upload(RenderDevice.INSTANCE.createCommandList(), new FutureDequeDrain<>(futures)); | ||
| } | ||
|
|
||
| this.builder.maybeIncreaseThreadLimit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateChunks is used to essentially tick the chunk render manager. It's possible that other mods modify how this works or call into the builders directly, in which case putting this here might cause performance limitations. I'm mainly thinking that this could cause issues with SeedQueue. But, based on preliminary testing, SeedQueue does seem to work fine - not confident on that though.
| } | ||
|
|
||
| this.stopWorkers(); | ||
| if (this.running.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this mostly for debugging purposes. However, unless another mod is directly calling in to init, I'm pretty sure this predicate is accurate. There is only one reference to init that I can find in IntelliJ and it's directly after the constructor call.
| } | ||
|
|
||
| // Temporary seedqueue compatibility - log less so that its inject works | ||
| // LOGGER.info("Stopping {} worker threads", this.threads.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SeedQueue suppresses some log messages, which limits our ability to change them. Ideally we would log the worker thread count here. Could just format a string then pass it in.
| // Sometimes, we want more threads :) | ||
| // In that case, let's slowly add them! | ||
| // :) Doesn't everyone love more threads? | ||
| if (!this.hasThreadSpace || (this.updates++ < 60 /* Arbitrary! */) || !this.running.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely arbitrary value. I'm not sure if calls of this function are tied to ticks or to render updates (i.e. fps). I think it's likely the latter. In that case, we almost want to just use a clock, but I did it this way because polling system clocks can be a bit less performant.
|
Attempted a different fix that keeps threads around and found that the issue is not threads as hypothesized. At this point I don't think there's really anything else to go off of unfortunately as any individual object being created on a per-thread basis could be the problem (although I suspect it might be the |
|
Closing in favour of #39 |
See #37
Ideally, I think some of the constant/arbitrary values here should be configurable.
I can do that later if it's agreed that this is a viable solution for the F3+F lag spikes.I think we actually need a solution like this (at least with thread scaling) in order to ensure compat with categories that require high render distance (e.g. AA)We do a very cursed update()-based runtime calculation to gradually create more render threads, up until the theoretical maximum (lcores). This (based on extremely preliminary testing) seems to have the upside seen in #36 of reducing/mostly eliminating the extreme lag spikes when using F3+F repeatedly, while also (theoretically, untested, but I don't see how this wouldn't be the case) removing the downside of slower chunk loading when not modifying render distance (e.g. while flying in AA).
Changes recommended (for myself or anyone else who picks this up, if this is agreed on as a viable route to take to fix this issue):
Concept: Pass information on render distance down into the ChunkBuilder, and change scaling / min / max threads based on render distance. (Note: This is based on my understanding that each ChunkBuilder has 1, and exactly 1, render distance - I'm not 100% certain of this, but it does seem to be the case)Note - I implemented this. Seems to lead to pretty decent behaviour in-game.However, I checked again with Jojoe's PR, and it doesn't seem like that PR is noticeably slower/faster than this one (with viewDistance cap). I think this implementation is better in theory, but it's hard to prove exactly.Tested against Jojoe's PR, this is more performant for loading 32 rd.Current Issues with PR:
Miscellaneous Notes: