Skip to content

Expand random_state type hints to accept RandomState#527

Merged
j-adamczyk merged 7 commits intoMLCIL:masterfrom
LiudengZhang:feat/random-state-typing
Mar 24, 2026
Merged

Expand random_state type hints to accept RandomState#527
j-adamczyk merged 7 commits intoMLCIL:masterfrom
LiudengZhang:feat/random-state-typing

Conversation

@LiudengZhang
Copy link
Copy Markdown
Contributor

Summary

  • Expanded random_state parameter type annotations from int | None to int | np.random.RandomState | np.random.Generator | None across 5 files, aligning with scikit-learn conventions
  • Skipped conformer_generator.py (RDKit only accepts int) and maxmin_split.py (same RDKit limitation)
  • randomized_scaffold_split.py already had the broad type and was left unchanged

Files changed

  • skfp/bases/base_fp_transformer.py
  • skfp/bases/base_substructure_fp.py
  • skfp/fingerprints/e3fp_fp.py
  • skfp/fingerprints/map.py
  • skfp/model_selection/hyperparam_search/randomized_search.py

Test plan

  • pytest tests/ -k "E3FP or MAP" — all 14 selected tests pass
  • No functional changes, type hints only

Fixes #523

@j-adamczyk
Copy link
Copy Markdown
Member

Thanks for the contribution.

I think we also need tests that cover those, checking if RandomState and Generator work properly for those classes that have random_state.

Also, code style, please use np.random.RandomState rather than importing it directly, I think that's more readable. We try to use np. prefix in all places where NumPy is used.

@LiudengZhang
Copy link
Copy Markdown
Contributor Author

Updated to use np.random. prefix and added np.random.Generator to the parameter constraints alongside tests for RandomState and Generator. Thanks for the feedback!

Copy link
Copy Markdown
Member

@j-adamczyk j-adamczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests for other classes, i.e., randomized search and randomized scaffold split. You can also check test coverage for those files, see the Makefile for example command.

"batch_size": [Integral, None],
"verbose": ["verbose", dict],
"random_state": ["random_state"],
"random_state": ["random_state", np.random.Generator],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I go through scikit-learn validation (https://github.com/scikit-learn/scikit-learn/blob/7aae3427d8b6b7ee6aa150ac355262dd0b0e793f/sklearn/utils/_param_validation.py#L557), I realized that we don't need to support np.random.Generator, just np.random.RandomState. So we can simplify this, e.g. keep only "random_state" here. Sorry for not noticing that earlier.

@LiudengZhang
Copy link
Copy Markdown
Contributor Author

Good catch — removed Generator from the imports and type hints in randomized_scaffold_split.py. The _parameter_constraints already correctly use "random_state" (which maps to int/RandomState/None in sklearn's validation). Tests for all three classes (MAP fingerprint, randomized search, and randomized scaffold split) pass with int, RandomState, and None.

Copy link
Copy Markdown
Member

@j-adamczyk j-adamczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change - move tests for each class/function next to its other functions. We want to group tests by major logical functionality, rather than by their code functionality like random state. After that, PR will be ready to merge

Comment on lines +49 to +71
@pytest.mark.parametrize(
"random_state",
[42, np.random.RandomState(42), None],
ids=["int", "RandomState", "None"],
)
def test_randomized_scaffold_split_random_state_types(random_state):
"""randomized_scaffold_train_test_split should accept int, RandomState, or None."""
smiles = [
"C1CCCC(C2CC2)CC1",
"c1n[nH]cc1C1CCCCCC1",
"c1n[nH]cc1CC1CCCCCC1",
"C1CCCC(CC2CCOCC2)CC1",
"c1ccc2nc(OC3CCC3)ccc2c1",
"O=C(CCc1cscn1)NC1CCNCC1",
"c1ccc2nc(OC3CCOC3)ccc2c1",
"c1ccc2nc(NC3CCOCC3)ccc2c1",
"c1ccc2nc(N3CCCOCC3)ccc2c1",
"c1ccc2nc(N3CCn4ccnc4C3)ccc2c1",
]
train, test = randomized_scaffold_train_test_split(
smiles, random_state=random_state
)
assert len(train) + len(test) == len(smiles)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to randomized_scaffold_train_test_split tests

Comment on lines +30 to +46
def test_randomized_search_random_state_types(smallest_mols_list, random_state):
"""FingerprintEstimatorRandomizedSearch should accept int, RandomState, or None."""
num_mols = len(smallest_mols_list)
y = np.concatenate([np.ones(num_mols // 2), np.zeros(num_mols - num_mols // 2)])

fp = AtomPairFingerprint()
fp_params = {"max_distance": list(range(2, 6))}
estimator_cv = GridSearchCV(
estimator=DummyClassifier(strategy="constant", constant=1),
param_grid={"constant": [0, 1]},
scoring="accuracy",
)
fp_cv = FingerprintEstimatorRandomizedSearch(
fp, fp_params, estimator_cv, n_iter=2, random_state=random_state
)
fp_cv.fit(smallest_mols_list, y)
assert len(fp_cv.cv_results_) == 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to FingerprintEstimatorRandomizedSearch tests

Comment on lines +18 to +22
def test_map_fp_random_state_types(smallest_smiles_list, random_state):
"""MAPFingerprint should accept int, RandomState, or None."""
fp = MAPFingerprint(random_state=random_state, n_jobs=-1)
X = fp.transform(smallest_smiles_list)
assert X.shape[0] == len(smallest_smiles_list)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to MAPFingerprint tests

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the tests for other fingerprints which use random state too.

@LiudengZhang
Copy link
Copy Markdown
Contributor Author

Done — moved each test next to its class/function tests: test_map_fp_random_state_types into tests/fingerprints/map.py, test_randomized_search_random_state_types into tests/model_selection/hyperparam_search/randomized_selection.py, and test_randomized_scaffold_split_random_state_types into tests/model_selection/splitters/randomized_scaffold_split.py. Deleted the standalone test_random_state_typing.py. All 9 tests pass.

j-adamczyk
j-adamczyk previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Collaborator

@mjste mjste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for fingerprints other than map that utilize randomness.

@LiudengZhang
Copy link
Copy Markdown
Contributor Author

Added random_state type tests for E3FPFingerprint in 99668fc. That should cover all fingerprints that use randomness (MAP and E3FP).

@LiudengZhang
Copy link
Copy Markdown
Contributor Author

Added E3FP random_state type tests in 99668fc. MAP and E3FP are the only two fingerprints that accept random_state, so that should cover all of them. Let me know if I missed any!

Copy link
Copy Markdown
Member

@j-adamczyk j-adamczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please use random seed 0 in tests, we use that convention everywhere
  2. There is a merge conflict that needs to be resolved

After those, I want to merge this PR

@j-adamczyk j-adamczyk changed the title Expand random_state type hints to accept RandomState and Generator Expand random_state type hints to accept RandomState Mar 22, 2026
Copy link
Copy Markdown
Member

@j-adamczyk j-adamczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing - docstrings should state int, RandomState instance or None to follow scikit-learn

Align random_state parameter type annotations with scikit-learn conventions
by accepting np.random.RandomState and np.random.Generator in addition to
int and None.

Fixes MLCIL#523
Per review: sklearn validation only supports RandomState, not Generator.
Reverted _parameter_constraints to ["random_state"] (no Generator).
Type hints now use int | RandomState | None (no Generator).
Added parametrized tests for FingerprintEstimatorRandomizedSearch and
randomized_scaffold_train_test_split.
sklearn's random_state constraint only accepts int, RandomState, or
None — Generator is not supported. Remove it from imports and type
hints in randomized_scaffold_split.py.
Group tests by logical functionality as requested. Moved from
standalone test_random_state_typing.py into map.py,
randomized_selection.py, and randomized_scaffold_split.py.
Use random seed 0 (repo convention) in all random_state type
tests. Update docstrings to follow scikit-learn convention:
"int, RandomState instance or None".
@LiudengZhang LiudengZhang force-pushed the feat/random-state-typing branch from 99668fc to 069beea Compare March 23, 2026 18:48
@j-adamczyk
Copy link
Copy Markdown
Member

@LiudengZhang we saw that fixes have been pushed and tests passed, so I'm merging this. Thanks for the contribution!

@j-adamczyk j-adamczyk merged commit 0aa3f28 into MLCIL:master Mar 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More precise random_state typing

3 participants