Skip to content

feat: Add dataset store#24

Open
Ramlaoui wants to merge 4 commits intomainfrom
feat/dataset-store
Open

feat: Add dataset store#24
Ramlaoui wants to merge 4 commits intomainfrom
feat/dataset-store

Conversation

@Ramlaoui
Copy link
Collaborator

Why

For some benchmarks, we want to compare structures against reference datasets. For example, we might want to compare the hash of generated materials against LeMat-Bulk to check whether one of the materials is already in the dataset according to some hashing algorithm. This new object tries to tackle this problem.

Method

The way this is implemented here is through a DatasetStore which contains a hashing algorithm, and stores some embeddings (either hashes or vectors based on whether we are using a hasher or a similarity matcher that supports embeddings).

A list of structures can then be fitted where we compute the embeddings of all of these structures and store them. This then makes querying candidate materials faster since we don't have to recompute the embeddings of the dataset each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant