Skip to content

Conversation

@DesktopFolder
Copy link

@DesktopFolder DesktopFolder commented Nov 17, 2024

See #37

Ideally, I think some of the constant/arbitrary values here should be configurable. I can do that later if it's agreed that this is a viable solution for the F3+F lag spikes. I think we actually need a solution like this (at least with thread scaling) in order to ensure compat with categories that require high render distance (e.g. AA)

We do a very cursed update()-based runtime calculation to gradually create more render threads, up until the theoretical maximum (lcores). This (based on extremely preliminary testing) seems to have the upside seen in #36 of reducing/mostly eliminating the extreme lag spikes when using F3+F repeatedly, while also (theoretically, untested, but I don't see how this wouldn't be the case) removing the downside of slower chunk loading when not modifying render distance (e.g. while flying in AA).

Changes recommended (for myself or anyone else who picks this up, if this is agreed on as a viable route to take to fix this issue):

  • Configurable hard thread limit as in Added render threads option #36 (default to lcores, though)
  • Configurable initial threads
  • Configurable (or, just better) thread creation scaling - this creates 1 thread per 60 updates; I think it should probably be more like 60-30-15-7 (logarithmically increasing, so that we quickly hit max threads - e.g. when 32rd is hit)
  • Concept: Pass information on render distance down into the ChunkBuilder, and change scaling / min / max threads based on render distance. (Note: This is based on my understanding that each ChunkBuilder has 1, and exactly 1, render distance - I'm not 100% certain of this, but it does seem to be the case) Note - I implemented this. Seems to lead to pretty decent behaviour in-game. However, I checked again with Jojoe's PR, and it doesn't seem like that PR is noticeably slower/faster than this one (with viewDistance cap). I think this implementation is better in theory, but it's hard to prove exactly. Tested against Jojoe's PR, this is more performant for loading 32 rd.

Current Issues with PR:

  • May have side effects with SeedQueue performance / configuration options
  • Completely breaks SeedQueue (stops generating seeds) this is probably because it breaks SeedQueue worker injection somehow?
  • The calculations are kind of mediocre, I think they could be tuned slightly for some improvement.
  • We still have small amounts of lag with F3+F here and there (although vastly better than anything else I've tested in identical conditions)

Miscellaneous Notes:

  • I ran [normal sodium] with VisualVM and was able to replicate 20+ second freezes with no apparent related behaviour in the GC. I think Windows might just hate our threads? I'm not sure. In any case, unless someone with an extremely deep knowledge of how JVM+GC+threads interacts pops up, I don't think it'll be possible to find a "true" root cause fix for this.
  • Based on a frankly absurd amount of debugging and profiling, I have been completely unable to find 1 culprit, despite the fact that we know the lag spikes are directly related to the number of threads we create (see doogile fix). I'm pretty sure (97% confidence) that this is just some weird GC behaviour that freezes us a ton when threads are in some unknown state. Therefore, this is really a suboptimal solution; ideally we would figure out how to manipulate the thread objects so that this doesn't occur. (Maybe reduce cycles..?)

@DesktopFolder
Copy link
Author

Yet another note (yikes).

I did a bit of benchmarking to properly compare this PR to the cleaner Jojoe one. Unfortunately (because this is jank af) or fortunately (because if this wasn't the case, I'd be very confused), this PR does seem to be pretty significantly faster, building 32rd of chunks in 24-27s on my PC compared to a wildly varying 33-40s on the Jojoe PR. This testing wasn't super extensive (maybe 4-6 tries each) but I didn't bother going too much further because the numbers were very consistent across restarts/different terrain.

I think we could probably tune this PR further for F3+F use cases but at this point I've invested way too much time into investigating/tuning this so I'm going to leave this for comment. Let me know if you have a better idea or approach.

this.world = world;

this.builder = new ChunkBuilder<>(backend.getVertexType(), this.backend);
this.builder.SetupWithRender(renderDistance); // Split out for compat...
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is split out because other mods love injecting into Sodium. Ideally we would just pass this into the constructor.

this.backend.upload(RenderDevice.INSTANCE.createCommandList(), new FutureDequeDrain<>(futures));
}

this.builder.maybeIncreaseThreadLimit();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateChunks is used to essentially tick the chunk render manager. It's possible that other mods modify how this works or call into the builders directly, in which case putting this here might cause performance limitations. I'm mainly thinking that this could cause issues with SeedQueue. But, based on preliminary testing, SeedQueue does seem to work fine - not confident on that though.

}

this.stopWorkers();
if (this.running.get()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this mostly for debugging purposes. However, unless another mod is directly calling in to init, I'm pretty sure this predicate is accurate. There is only one reference to init that I can find in IntelliJ and it's directly after the constructor call.

}

// Temporary seedqueue compatibility - log less so that its inject works
// LOGGER.info("Stopping {} worker threads", this.threads.size());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeedQueue suppresses some log messages, which limits our ability to change them. Ideally we would log the worker thread count here. Could just format a string then pass it in.

// Sometimes, we want more threads :)
// In that case, let's slowly add them!
// :) Doesn't everyone love more threads?
if (!this.hasThreadSpace || (this.updates++ < 60 /* Arbitrary! */) || !this.running.get()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely arbitrary value. I'm not sure if calls of this function are tied to ticks or to render updates (i.e. fps). I think it's likely the latter. In that case, we almost want to just use a clock, but I did it this way because polling system clocks can be a bit less performant.

@DesktopFolder
Copy link
Author

Attempted a different fix that keeps threads around and found that the issue is not threads as hypothesized. At this point I don't think there's really anything else to go off of unfortunately as any individual object being created on a per-thread basis could be the problem (although I suspect it might be the ChunkBuildBuffers object that causes us issues).

@DesktopFolder
Copy link
Author

Closing in favour of #39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant