-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
QueryData/QueryFilter refactors and query lookahead
#22500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
QueryData/QueryFilter refactors, and query lookaheadQueryData/QueryFilter refactors and query lookahead
|
It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note. Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes. |
No thanks 😅 I think this approach is clearer, and I really don't want to break user code that badly. |
|
I'd be very interested in seeing some rendering benchmarks for this too! |
Yeah, I thought not lol :)
Will do! I've been meaning to get one of the big glTF scenes set up locally to test |
chescock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a cool idea! I'm always amazed by these cases where you can speed things up without doing less work just by doing it in a different order.
Hmm, this approach wouldn't work well with my plan for Nested Queries (#21557). The idea there is that we'd evaluate Would it work to leave (Or... do we ever need lookahead on |
I think I'm leaning towards something like option 1. What if we added a |
|
|
||
| #[doc(hidden)] | ||
| #[derive(Clone)] | ||
| pub struct AnyOfFetch<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't directly related to this PR, but I've been wondering whether we should change the implementation of AnyOf to delegate to Option and Or somehow. You wound up basically having to duplicate the find_table_chunk and find_archetype_chunk implementations between AnyOf and Or.
|
|
||
| /// A wrapper `Fetch` type for tuple query terms that caches the last seen valid chunk. | ||
| #[doc(hidden)] | ||
| pub struct TupleFetch<'w, T: WorldQuery> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had missed that this affected tuples and not just Or! It makes sense that it's symmetric, though.
It looks like you only added the caching for tuples, though. I think you also need it for Or and AnyOf. And for the #[derive] macros, which should work like tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, mostly wanted to get your eyes on it before I added those, I'll do that this afternoon :)
| /// `table_row` must be in the range of the current table and archetype. | ||
| /// - There must not be simultaneous conflicting component access registered in `update_component_access`. | ||
| #[inline(always)] | ||
| unsafe fn try_fetch<'w, 's>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I see how this approach to try_fetch avoids to double-query problem with nested queries. We'll be able to override Parent::try_fetch to make a single call, which will help for get. But regular iteration will use find_table_chunk and fetch, which will call matches and fetch, which will wind up invoking the nested query twice.
I think we might want to keep the mechanisms separate, so that Query<Parent<&T>, Changed<&U>> can use lookahead to evaluate Changed<&U> but then make nested Parent<&T> queries one-by-one. So, add matches, but keep fetch returning Option, and query iteration code needs to handle both ways of filtering.
But also: Nested queries don't exist yet! I don't want you to feel like you need to handle my pet project in this PR, and I'm happy to rework this again if this PR gets merged and then we decide we want nested queries. It will just seem a little silly if I do a PR to make fetch return an Option, and then you remove the Option in this PR, and then I add it back again in a third PR :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it won't speed up chunked iteration for &Parent. Though, currently chunked iteration only applies to for_each anyways! That was originally because of some perf issues I found, but I'd be fine keeping it that way tbh. It kind of makes sense to have different optimizations apply to different kinds of queries. It was kind of jank to shove all the chunking logic into QueryIterationCursor anyways lol.
Objective
Make query iteration faster, more flexible, and give query terms more control over how to iterate.
This PR implements "chunk-based query lookahead". Instead of checking the conditions for each row row-wise, we now check them column-wise, iterating forward for each term until we find the next contiguous "chunk" of valid rows. By passing the chunk returned by each query term to the next one, we can gradually narrow down our search area, like if we were short-circuiting in a row-wise check.
This works fine for normal query iteration (and is even faster by default in some cases!), but the main point is it gives query terms more control over how they iterate. In particular, this pattern is perfect for searching in something like a segment tree! The goal is for custom query terms (and changed detection eventually!) to be able to greatly speed up iteration where they need to.
It also might******* be more likely to auto-vectorize plain condition checking, there were vectorized loops in some of the asm dumps but I wasn't able to verify that it was condition checking in particular.
Solution
IS_ARCHETYPALontoWorldQueryWorldQuery::matches, leavingQueryFilterempty, andremoving the
Optionwrapper fromQueryData::fetch.WorldQuery::find_table_chunkandfind_archetype_chunkto let query terms determineif/how to seek ahead in the query.
QueryIter::fold/for_eachto use chunk-based iteration.NOTE: I had some strange performance issues with
QueryIter::next, so I left it as-is for now. I saw as much as a 50% performance hit in some benchmarks, and I was able to (mostly) fix it, but at the cost of a similar hit tofor_each. It feels like a weird bug though, so it seems worth coming back to eventually.Testing
Ran examples
Ran benchmarks:
none_changed_detection: 10-20% faster or no change vsmainfew_changed_detection: 40-55% faster vsmainall_changed_detection: ~20% faster vsmainfor table iteration, ~20% slower for sparse (that feels like a bug, will investigate)iter_simple: generally no change, 15% faster forwide_sparse_set, 3-4% slower forforeach_hybridandpar_hybriditer_frag: generally no change, a few were faster by 5-7% and a few were slower by 5-10%heavy_compute: no change vsmainNeeds real world testing! I don't have a good representative benchmark myself.
Future Work
AddedandChangedQuery::nextQueryinto one