feat: Add dataset store by Ramlaoui · Pull Request #24 · LeMaterial/lematerial-hasher

Ramlaoui · 2025-04-23T20:29:24Z

Why

For some benchmarks, we want to compare structures against reference datasets. For example, we might want to compare the hash of generated materials against LeMat-Bulk to check whether one of the materials is already in the dataset according to some hashing algorithm. This new object tries to tackle this problem.

Method

The way this is implemented here is through a DatasetStore which contains a hashing algorithm, and stores some embeddings (either hashes or vectors based on whether we are using a hasher or a similarity matcher that supports embeddings).

A list of structures can then be fitted where we compute the embeddings of all of these structures and store them. This then makes querying candidate materials faster since we don't have to recompute the embeddings of the dataset each time.

Ramlaoui added 4 commits April 23, 2025 22:23

feat: Add helper methods for dataset store with similarity

f4dfed2

feat: Add dataset store

6a6fed0

chore: store embeddings directly

2fc27b7

chore: Load class from store directly

33c8652

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add dataset store#24

feat: Add dataset store#24
Ramlaoui wants to merge 4 commits intomainfrom
feat/dataset-store

Ramlaoui commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ramlaoui commented Apr 23, 2025

Why

Method

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant