-
Notifications
You must be signed in to change notification settings - Fork 6
Query Execution Algorithm
Steps in executing a query:
- Check if the query is equivalent to a multi-property, and if it is, swap out the query for its multi-property equivalent.
- Update any relevant indexes.
- Collect the indexed sets for any properties that are indexed.
Implementation diverges at this point depending on query method. Assume original query_people for simplicity.
- Choose a "source": the set that serves as the initial set of
PersonIds that will be intersected with subsequent sets.- 4.a. Sort the set of indexed sets from smallest to largest, and choose the smallest as the source.
- 4.b. If there are no indexed sets, create an iterator over the entire population.
- For each
PersonIdin the source set:- 5.1. check that it is contained in all indexed sets (bailing early)
- 5.2. for each unindexed property, check that the
PersonIdhas the right value for the property.
Observations:
- If it is faster to just look up a property value for a
PersonIdin the context and compare it to the query's value than it is to check for membership in the index set, then we should replace 5.1 with 5.2.
Input: A property/value pairs (e.g. InfectionStatus=Infected, AgeGroup=Child) Output: An iterator OR the full set (with_query_results)
SOURCE:
Index(property_id, value) -> Set of PersonIds
First, update the index.
returns Ref<HashSet<PersonId>>
e.g. Susceptible: [1, 2, 3]
ValueStore(property_id) -> Set of Values, where the index = PersonId lookup the property value storage vector. e.g. InfectionStatus: [S, None, I, S, I] Note: It's possible this is shorter than the length of the population if properties haven't been assigned past some ID.
Branch 0: EMPTY QUERY FOR ITERATOR: Return an iterator over the whole population
Branch 1: UNINDEXED Condition: No properties are indexed.
Retrieve the ValueStore for each property; Find the intersection of the ValueStores
FOR ITERATOR: Return an iterator over that intersection
Branch 2: INTERSECTION Condition: One or more the properties are indexed separately Update any relevant indexes.
Retrieve Index or ValueStore for each property
Sort the set of indexed sets from smallest to largest, and choose the smallest as the source. Find the intersection by iterating.
FOR ITERATOR: Return an iterator over that intersection
Branch 3: SINGLE INDEX Condition: There's an multi-property index that matches the given set, or there's a single property that is indexed.
Retrieve Index
FOR ITERATOR: Return an iterator over that intersection FOR FULL SET: Return reference directly
-
Likely break up the pieces of the algorithm into reusable methods:
Query::get_indexed_sets()Query::matches_unindexed()
-
Loops over component properties (steps 2, 3, maybe 5) get unrolled
-
(property type ID, property value hash) pairs are replaced with the original (tag type, property value) pair expressed in the query. This is not necessarily a win in the indexed case. In the unindexed case, this avoids a hash of the value.
-
When queries define their own querying logic, opens the door to more sophisticated queries, e.g. having predicates instead of values (
|age| age > 20). -
"Untyped" query variants (right now just
Vec<(TypeId, HashValueType))) are still possible, because the API toPeopleDataandUntypedIndexare unchanged.- BUT I'd like to get rid of the current impls, because we don't use them but we're spending maintenance / dev resources on them.
- HOWEVER, getting rid of the current impls probably means axing the WebAPI, which also isn't used.