Skip to content

Conversation

@matthewpeterkort
Copy link
Collaborator

@matthewpeterkort matthewpeterkort commented May 30, 2025

This PR focuses on improving the performance of the GRIDS driver.
See benchtop PR bmeg/benchtop#6

OLD grids driver -- 3 months old. Some of the queries in the old driver didn't resolve correctly. See grip-benchmark repo for more thorough benchmark comparison

Loaded: META/DocumentReference.ndjson in 23.19 seconds
Loaded: META/Specimen.ndjson in 3.39 seconds
Loaded: META/Medication.ndjson in 2.18 seconds
Loaded: META/Observation.ndjson in 112.92 seconds
Loaded: META/ResearchStudy.ndjson in 0.45 seconds
Loaded: META/SubstanceDefinition.ndjson in 0.27 seconds
Loaded: META/Condition.ndjson in 5.68 seconds
Loaded: META/ResearchSubject.ndjson in 5.88 seconds
Loaded: META/Group.ndjson in 12.80 seconds
Loaded: META/MedicationAdministration.ndjson in 4.95 seconds
Loaded: META/Patient.ndjson in 2.55 seconds
Loaded: META/Substance.ndjson in 0.33 seconds
Loaded Test data in 174.59 seconds

query 0: #################################################################################
Successfully queried 427193 rows in 12.71 seconds
query 1: #################################################################################
Successfully queried 213530 rows in 84.17 seconds
query 2: #################################################################################
Successfully queried 85433 rows in 58.80 seconds
query 3: #################################################################################
Successfully queried 80 rows in 77.66 seconds
query 4: #################################################################################
Successfully queried 10000 rows in 79.29 seconds
query 5: #################################################################################
Successfully queried 10000 rows in 77.11 seconds
query 6: #################################################################################
Successfully queried 0 rows in 72.24 seconds
query 7: #################################################################################
Successfully queried 0 rows in 169.77 seconds
query 8: #################################################################################
Successfully queried 0 rows in 71.55 seconds
Total Benchmark Time:  877.90

NEW driver using sonic, local caching, a different key structure. Note: some of the test cases have been expanded. This is not an apples to apples test, but gives a rough idea of speed improvements:

Benchmark Start  2025-08-18 12:12:39.319253
Index added in 0.00 seconds
Loaded: META/DocumentReference.ndjson in 21.17 seconds
Loaded: META/Specimen.ndjson in 3.07 seconds
Loaded: META/Medication.ndjson in 0.71 seconds
Loaded: META/Observation.ndjson in 111.02 seconds
Loaded: META/ResearchStudy.ndjson in 1.38 seconds
Loaded: META/SubstanceDefinition.ndjson in 0.72 seconds
Loaded: META/Condition.ndjson in 5.19 seconds
Loaded: META/ResearchSubject.ndjson in 5.08 seconds
Loaded: META/Group.ndjson in 7.81 seconds
Loaded: META/MedicationAdministration.ndjson in 5.34 seconds
Loaded: META/Patient.ndjson in 1.25 seconds
Loaded: META/Substance.ndjson in 0.69 seconds
Loaded Test data in 163.47 seconds

query 0: #################################################################################
Successfully queried 427197 rows in 0.21 seconds
query 1: #################################################################################
Successfully queried 4 rows in 16.56 seconds
query 2: #################################################################################
Successfully queried 2 rows in 6.44 seconds
query 3: #################################################################################
Successfully queried 214843 rows in 37.65 seconds
query 4: #################################################################################
Successfully queried 85433 rows in 11.87 seconds
query 5: #################################################################################
Successfully queried 80 rows in 6.65 seconds
query 6: #################################################################################
Successfully queried 42968 rows in 10.34 seconds
query 7: #################################################################################
Successfully queried 18524 rows in 24.49 seconds
query 8: #################################################################################
Successfully queried 6789 rows in 9.60 seconds
query 9: #################################################################################
Successfully queried 73162 rows in 11.19 seconds
Successfully queried 0 rows in 9.28 seconds
Total Benchmark Time:  307.76 

Adds indexing to Grids driver and SQL like drivers.
Fixes indexing in Mongo driver to conform to existing conformance tests

Refactors optimizer to be less annoying via a MATCH REPLACE pattern.

Implements a grids optimizer. Match matches a specific pipeline of statements and REPLACE specifies the custom grids executor to be used for that statement

Implements separate pebble index in grids so that V.has() and V().hasLabel() expressions can be more performant.

Works for nested fields using json path filter notation. See ot_index tests for details. Grids optimizer works on compound has statements.

Adds caching on stored data indices so that 2 benchtop lookup to figure out which table the data is in and where in the table the data is located can be reduced down to a cache get.

Removes asoc index for deletes and stores vertex label in value() for vertex key lookup

Tries, then scraps an idea of storing TO and FROM edge labels in the edge key in favor of a caching based solution since this would make it so that all edges in a series of bulk load commands would have to be loaded after all vertices since the way that ETL is done currently each bulk load command loads one vertex label type at a time it is not possible to know future vertex label types that have not been loaded yet when loading generated edges.

Reworks stored grids keys to be more like pebble keys. Removes "grid" storage style for an easier to use more
time performant but less space performant storage method.

Implements cache batch loading to speed this up loading saved indices from disk. This only applies on the first query after grip is restarted.

Also feature/kafka still needs a review.

@matthewpeterkort matthewpeterkort changed the title [WIP] Feature/indexing Feature/indexing Jun 11, 2025
if len(auth) > 0 {
user, password, ok := parseBasicAuth(auth[0])
fmt.Printf("User: %s Password: %s OK: %s\n", user, password, ok)
log.Infof("User: %s Password: %s OK: %t\n", user, password, ok)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably start censoring the password

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return expanded
}

func GripOptimizer(pipe []*gripql.GraphStatement) []*gripql.GraphStatement {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should be named GridsOptimizer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewpeterkort
Copy link
Collaborator Author

matthewpeterkort commented Jun 16, 2025

Cleaned up a lot of the logic. Added support for optimized nested field filtering and compound filters See updates in recent commit

@kellrott
Copy link
Member

Can you add an 'Experimental' warning to the logging for when the GRIDS driver is used?

@kellrott kellrott changed the base branch from feature/kafka to updates/docs August 22, 2025 21:34
@kellrott kellrott merged commit 872d04c into updates/docs Aug 27, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants