Expand random_state type hints to accept RandomState#527
Expand random_state type hints to accept RandomState#527j-adamczyk merged 7 commits intoMLCIL:masterfrom
Conversation
|
Thanks for the contribution. I think we also need tests that cover those, checking if Also, code style, please use |
|
Updated to use |
j-adamczyk
left a comment
There was a problem hiding this comment.
Please also add tests for other classes, i.e., randomized search and randomized scaffold split. You can also check test coverage for those files, see the Makefile for example command.
skfp/bases/base_fp_transformer.py
Outdated
| "batch_size": [Integral, None], | ||
| "verbose": ["verbose", dict], | ||
| "random_state": ["random_state"], | ||
| "random_state": ["random_state", np.random.Generator], |
There was a problem hiding this comment.
Now that I go through scikit-learn validation (https://github.com/scikit-learn/scikit-learn/blob/7aae3427d8b6b7ee6aa150ac355262dd0b0e793f/sklearn/utils/_param_validation.py#L557), I realized that we don't need to support np.random.Generator, just np.random.RandomState. So we can simplify this, e.g. keep only "random_state" here. Sorry for not noticing that earlier.
|
Good catch — removed |
j-adamczyk
left a comment
There was a problem hiding this comment.
One last change - move tests for each class/function next to its other functions. We want to group tests by major logical functionality, rather than by their code functionality like random state. After that, PR will be ready to merge
| @pytest.mark.parametrize( | ||
| "random_state", | ||
| [42, np.random.RandomState(42), None], | ||
| ids=["int", "RandomState", "None"], | ||
| ) | ||
| def test_randomized_scaffold_split_random_state_types(random_state): | ||
| """randomized_scaffold_train_test_split should accept int, RandomState, or None.""" | ||
| smiles = [ | ||
| "C1CCCC(C2CC2)CC1", | ||
| "c1n[nH]cc1C1CCCCCC1", | ||
| "c1n[nH]cc1CC1CCCCCC1", | ||
| "C1CCCC(CC2CCOCC2)CC1", | ||
| "c1ccc2nc(OC3CCC3)ccc2c1", | ||
| "O=C(CCc1cscn1)NC1CCNCC1", | ||
| "c1ccc2nc(OC3CCOC3)ccc2c1", | ||
| "c1ccc2nc(NC3CCOCC3)ccc2c1", | ||
| "c1ccc2nc(N3CCCOCC3)ccc2c1", | ||
| "c1ccc2nc(N3CCn4ccnc4C3)ccc2c1", | ||
| ] | ||
| train, test = randomized_scaffold_train_test_split( | ||
| smiles, random_state=random_state | ||
| ) | ||
| assert len(train) + len(test) == len(smiles) |
There was a problem hiding this comment.
Move to randomized_scaffold_train_test_split tests
| def test_randomized_search_random_state_types(smallest_mols_list, random_state): | ||
| """FingerprintEstimatorRandomizedSearch should accept int, RandomState, or None.""" | ||
| num_mols = len(smallest_mols_list) | ||
| y = np.concatenate([np.ones(num_mols // 2), np.zeros(num_mols - num_mols // 2)]) | ||
|
|
||
| fp = AtomPairFingerprint() | ||
| fp_params = {"max_distance": list(range(2, 6))} | ||
| estimator_cv = GridSearchCV( | ||
| estimator=DummyClassifier(strategy="constant", constant=1), | ||
| param_grid={"constant": [0, 1]}, | ||
| scoring="accuracy", | ||
| ) | ||
| fp_cv = FingerprintEstimatorRandomizedSearch( | ||
| fp, fp_params, estimator_cv, n_iter=2, random_state=random_state | ||
| ) | ||
| fp_cv.fit(smallest_mols_list, y) | ||
| assert len(fp_cv.cv_results_) == 2 |
There was a problem hiding this comment.
Move to FingerprintEstimatorRandomizedSearch tests
| def test_map_fp_random_state_types(smallest_smiles_list, random_state): | ||
| """MAPFingerprint should accept int, RandomState, or None.""" | ||
| fp = MAPFingerprint(random_state=random_state, n_jobs=-1) | ||
| X = fp.transform(smallest_smiles_list) | ||
| assert X.shape[0] == len(smallest_smiles_list) |
There was a problem hiding this comment.
Move to MAPFingerprint tests
There was a problem hiding this comment.
Please make the tests for other fingerprints which use random state too.
|
Done — moved each test next to its class/function tests: |
mjste
left a comment
There was a problem hiding this comment.
Please add tests for fingerprints other than map that utilize randomness.
|
Added random_state type tests for E3FPFingerprint in 99668fc. That should cover all fingerprints that use randomness (MAP and E3FP). |
|
Added E3FP random_state type tests in 99668fc. MAP and E3FP are the only two fingerprints that accept |
j-adamczyk
left a comment
There was a problem hiding this comment.
- Please use random seed 0 in tests, we use that convention everywhere
- There is a merge conflict that needs to be resolved
After those, I want to merge this PR
j-adamczyk
left a comment
There was a problem hiding this comment.
One more thing - docstrings should state int, RandomState instance or None to follow scikit-learn
Align random_state parameter type annotations with scikit-learn conventions by accepting np.random.RandomState and np.random.Generator in addition to int and None. Fixes MLCIL#523
Per review: sklearn validation only supports RandomState, not Generator. Reverted _parameter_constraints to ["random_state"] (no Generator). Type hints now use int | RandomState | None (no Generator). Added parametrized tests for FingerprintEstimatorRandomizedSearch and randomized_scaffold_train_test_split.
sklearn's random_state constraint only accepts int, RandomState, or None — Generator is not supported. Remove it from imports and type hints in randomized_scaffold_split.py.
Group tests by logical functionality as requested. Moved from standalone test_random_state_typing.py into map.py, randomized_selection.py, and randomized_scaffold_split.py.
Use random seed 0 (repo convention) in all random_state type tests. Update docstrings to follow scikit-learn convention: "int, RandomState instance or None".
99668fc to
069beea
Compare
|
@LiudengZhang we saw that fixes have been pushed and tests passed, so I'm merging this. Thanks for the contribution! |
Summary
random_stateparameter type annotations fromint | Nonetoint | np.random.RandomState | np.random.Generator | Noneacross 5 files, aligning with scikit-learn conventionsconformer_generator.py(RDKit only acceptsint) andmaxmin_split.py(same RDKit limitation)randomized_scaffold_split.pyalready had the broad type and was left unchangedFiles changed
skfp/bases/base_fp_transformer.pyskfp/bases/base_substructure_fp.pyskfp/fingerprints/e3fp_fp.pyskfp/fingerprints/map.pyskfp/model_selection/hyperparam_search/randomized_search.pyTest plan
pytest tests/ -k "E3FP or MAP"— all 14 selected tests passFixes #523