-
Notifications
You must be signed in to change notification settings - Fork 78
Make task resource use atlas search #1203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Search Index:
API service:
In the future, the aim is to completely replace the current design query operators. But before that, as a transitional step, we have: Remaining items:
Client:
Query pattern shift:Previous query pipeline: All query keys are the field names, e.g.: {'nelements': 4, 'composition_reduced.Si': {'$exists': True}, 'composition_reduced.O': {'$exists': True}, 'composition_reduced.K': {'$exists': True}}These queries go under the “criteria” dictionary, e.g. {”criteria”: {'nelements': 4, 'composition_reduced.Si': {'$exists': True}, 'composition_reduced.O': {'$exists': True}, 'composition_reduced.K': {'$exists': True}}They are later inserted into an aggregation pipelilne. Because the actual path is the key, there is no (low) risk of duplication in the query params, so the design of query_operator follows Now, Atlas Search uses the actual operator as the key. e.g.: {
equals: {
path: "nelements",
value: 3,
},
},
{
exists: {
path: "composition_reduced.Ac",
},
},
{
exists: {
path: "composition_reduced.Cu",
},
},Due to this, the operator: {
compound: {
must: [
{
equals: {
path: "nelements",
value: 3,
},
},
{
exists: {
path: "composition_reduced.Ac",
},
},
{
exists: {
path: "composition_reduced.Cu",
},
},
{
equals: {
path: "calc_type",
value: "GGA Static",
},
},
],
},
},Note, because we are using existing query infra, the parameters are defined still as: To be able to keep using the current implementation. The difference is that the “criteria” stuff was translated to Improvements:
Other considerations:
The single operator query is three times faster than the compound query with a single operator.
Using a $match aggregation pipeline stage after a [$search](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#std-label-query-syntax-ref) stage can drastically slow down query results. If possible, design your
Using a $sort aggregation pipeline stage after a [$search](https://www.mongodb.com/docs/atlas/atlas-search/query-syntax/#std-label-query-syntax-ref) stage can drastically slow down query results. BenchmarkingDirect query on MongoDB**:V7 cluster + index vs. V8 cluster + atlas index:
query = {"formula_pretty": "NaCl"}
start_time = time.time()
results = list(v7_collection.find(query))
end_time = time.time()
print(f"Total API Query Time: {end_time - start_time:.4f} seconds")
print(f"Number of results: {len(results)}")Total API Query Time: 1.6338 seconds
Number of results: 57
# make a compound search:
must = []
for k,v in query.items():
must.append({"equals": {"path": k, "value": v} })
compound = {
"must": must
}
equals = {"path": list(query.keys())[0],
"value" : list(query.values())[0]
}
pipeline = [
{"$search": {"index":"default",
"compound": compound,
"returnStoredSource": True
},
},
]
start_time = time.time()
results = list(v8_collection.aggregate(pipeline)) # Run the actual query, not explain()
end_time = time.time()Total API Query Time: 0.6591 seconds
Number of results: 57Other heavier queries:
v7 + index = 402s, 4198 results Compound query: (current chemsys query )
v7 + index = TIME OUT V7 cluster + regular index + DEV API Server:
(Compound search really sucks) V8 cluster + Search Index + DEV API server:
|
|
FYI @kbuma |
Major changes:
Designed to work with updated maggma release. Also, the roadmap is to use the same design pattern across all our collections.
TODOs: