Skip to content

Add multithreading to MolFromSDFTransformer [Issue 467]#526

Merged
FloudMe77 merged 3 commits intomasterfrom
add-multithreading-to-MolFromSDFTransformer
Mar 22, 2026
Merged

Add multithreading to MolFromSDFTransformer [Issue 467]#526
FloudMe77 merged 3 commits intomasterfrom
add-multithreading-to-MolFromSDFTransformer

Conversation

@FloudMe77
Copy link
Copy Markdown
Collaborator

@FloudMe77 FloudMe77 commented Mar 10, 2026

This PR adds parallel SDF reading support in MolFromSDFTransformer when n_jobs > 1, using RDKit's MultithreadedSDMolSupplier (requires RDKit >= 2025.09.1).

Molecules loaded in parallel are sorted by record_id to preserve the original file order, and None results are filtered out to guard against nondeterministic duplicates from the multithreaded supplier.

When the installed RDKit version is too old or raw SDF text is passed instead of a file path, a warning is issued and loading falls back to sequential SDMolSupplier.

Additional changes:

  • explicit RDKit version check via _get_rdkit_version() utility,
  • parallel logic extracted into _read_sdf_file_parallel() for readability,
  • tests covering parallel loading, order preservation, and version fallback.

#467

@FloudMe77 FloudMe77 requested a review from j-adamczyk March 22, 2026 13:19
@FloudMe77 FloudMe77 merged commit d6a6afc into master Mar 22, 2026
13 checks passed
@FloudMe77 FloudMe77 deleted the add-multithreading-to-MolFromSDFTransformer branch March 22, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants