Query Execution Algorithm

Query design

The algorithm

Steps in executing a query:

Check if the query is equivalent to a multi-property, and if it is, swap out the query for its multi-property equivalent.
Update any relevant indexes.
Collect the indexed sets for any properties that are indexed.

Implementation diverges at this point depending on query method. Assume original query_people for simplicity.

Choose a "source": the set that serves as the initial set of PersonIds that will be intersected with subsequent sets.
- 4.a. Sort the set of indexed sets from smallest to largest, and choose the smallest as the source.
- 4.b. If there are no indexed sets, create an iterator over the entire population.
For each PersonId in the source set:
- 5.1. check that it is contained in all indexed sets (bailing early)
- 5.2. for each unindexed property, check that the PersonId has the right value for the property.

Observations:

If it is faster to just look up a property value for a PersonId in the context and compare it to the query's value than it is to check for membership in the index set, then we should replace 5.1 with 5.2.

The algorithm 2

Input: A property/value pairs (e.g. InfectionStatus=Infected, AgeGroup=Child) Output: An iterator OR the full set (with_query_results)

SOURCE: Index(property_id, value) -> Set of PersonIds First, update the index. returns Ref<HashSet<PersonId>> e.g. Susceptible: [1, 2, 3]

ValueStore(property_id) -> Set of Values, where the index = PersonId lookup the property value storage vector. e.g. InfectionStatus: [S, None, I, S, I] Note: It's possible this is shorter than the length of the population if properties haven't been assigned past some ID.

Branch 0: EMPTY QUERY FOR ITERATOR: Return an iterator over the whole population

Branch 1: UNINDEXED Condition: No properties are indexed.

Retrieve the ValueStore for each property; Find the intersection of the ValueStores

FOR ITERATOR: Return an iterator over that intersection

Branch 2: INTERSECTION Condition: One or more the properties are indexed separately Update any relevant indexes.

Retrieve Index or ValueStore for each property

Sort the set of indexed sets from smallest to largest, and choose the smallest as the source. Find the intersection by iterating.

FOR ITERATOR: Return an iterator over that intersection

Branch 3: SINGLE INDEX Condition: There's an multi-property index that matches the given set, or there's a single property that is indexed.

Retrieve Index

FOR ITERATOR: Return an iterator over that intersection FOR FULL SET: Return reference directly

Move querying functionality into `Query`

Likely break up the pieces of the algorithm into reusable methods:
- Query::get_indexed_sets()
- Query::matches_unindexed()
Loops over component properties (steps 2, 3, maybe 5) get unrolled
(property type ID, property value hash) pairs are replaced with the original (tag type, property value) pair expressed in the query. This is not necessarily a win in the indexed case. In the unindexed case, this avoids a hash of the value.
When queries define their own querying logic, opens the door to more sophisticated queries, e.g. having predicates instead of values (|age| age > 20).
"Untyped" query variants (right now just Vec<(TypeId, HashValueType))) are still possible, because the API to PeopleData and UntypedIndex are unchanged.
- BUT I'd like to get rid of the current impls, because we don't use them but we're spending maintenance / dev resources on them.
- HOWEVER, getting rid of the current impls probably means axing the WebAPI, which also isn't used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query Execution Algorithm

Query design

The algorithm

The algorithm 2

Move querying functionality into `Query`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Query Execution Algorithm

Query design

The algorithm

The algorithm 2

Move querying functionality into Query

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Move querying functionality into `Query`