[python-package] removed _json_default_with_numpy private function by daguirre11 · Pull Request #7145 · lightgbm-org/LightGBM

daguirre11 · 2026-01-31T17:23:48Z

Hi 👋 ,

I believe this private function should be removed instead of tested.

There are two functions that invoke _json_default_with_numpy , model_to_string here and dump_model here .
Each of these functions use json.dumps(self.pandas_categorical, default=_json_default_with_numpy).
self.pandas_categorical is only value other than None if the X data argument is given as a a pandas DataFrame here.
np.bool_, np.floating, and np.integer that are in the isinstance() if condition in _json_default_with_numpy here can all be converted to their appropriate python types resulting in self.pandas_categorical = None -> self.pandas_categorical = {}. This is possible because Pandas automatically converts NumPy scalars to pandas dtypes during DataFrame construction of the DataFrame.
Lastly, the next if condition in _json_default_with_numpy regarding np.ndarray here is not reachable in the code because it is not allow dtype for a pandas DataFrame based on the instilled checks.

data used:

    X = pd.DataFrame({
        'np_bool_col': [np.bool_(True), np.bool_(False), np.bool_(True)],
        'regular_col': [np.uint8(1), np.uint16(2), np.uint8(3)],
        'np_float_col': [np.float64(1.23), np.float64(4.56), np.float64(7.89)],
        'np_array_col': [
            np.array([1, 2, 3]),  
            np.array([4, 5, 6]),    
            np.array([7, 8, 9]) ,
        ],
    })

     def _check_for_bad_pandas_dtypes(pandas_dtypes_series: pd_Series) -> None:
        bad_pandas_dtypes = [
            f"{column_name}: {pandas_dtype}"
            for column_name, pandas_dtype in pandas_dtypes_series.items()
            if not _is_allowed_numpy_dtype(pandas_dtype.type)
        ]
        if bad_pandas_dtypes:
>           raise ValueError(
                f"pandas dtypes must be int, float or bool.\nFields with bad pandas dtypes: {', '.join(bad_pandas_dtypes)}"
            )
E           ValueError: pandas dtypes must be int, float or bool.
E           Fields with bad pandas dtypes: np_array_col: object

.venv/lib/python3.14/site-packages/lightgbm/basic.py:791: ValueError

I also checked _json_default_with_numpy by manually inputting a self.pandas_categorical that actually invokes the function.

    categorical_json = json.dumps(        
        {
            "feature_index": 0,
            "test_np_bool": np.bool_(True),
            "test_np_int64": np.int64(42), 
            "test_np_array": np.array([1,2,3])
        }, 
        default=_json_default_with_numpy,
    )

which results in a model dump json pandas categorical key value pair
pandas_categorical:{"feature_index": 0, "test_np_bool": true, "test_np_int64": 42, "test_np_array": [1, 2, 3]}

However, as explained before it is not possible for self.pandas_categorical to have a value like this.

If I am wrong please explain to me what I am missing 😃

jameslamb

Thank you very much for the investigation!

I'll need to take a little time to read through what you've shared here. The expectations around Dataset.pandas_categorical are a little unclear in the codebase, I'll try to improve that.

I can share that I looked through the git blame tonight and it seems this _json_default_with_numpy() function has been in lightgbm for 9 years (#247), and its addition didn't generate any discussion about why it was necessary. @wxchan added this but isn't active in LightGBM or on GitHub any more, so I don't think they'll be able to help us understand it.

I'll look at this shortly. Two other notes while I do that:

please do update your git config so your commits will be tied to your GitHub account (#7143 (comment))
in the future, share code links as raw links instead of wrapped in markdown like [here](link), so they'll be rendered directly in the GitHub UI like this:

https://github.com/microsoft/LightGBM/blob/74fa3863461854dee80722d4c1ccc4db696801aa/python-package/lightgbm/basic.py#L530-L537

daguirre11 · 2026-02-17T16:46:24Z

@jameslamb is there anything else that needs to be done? I updated my config as well as added my local mac machine emails to my github profile.

jameslamb

Thanks so much for investigating, sorry it took a while for me to review!

I tested some combinations tonight and I agree with you! The category values get converted to base types like float and int by the time they are written to Dataset.pandas_categorical, and therefore don't cause any problems for JSON serialization.

And I don't think an np.ndarray could ever reach this code.

I've pushed a unit test that confirms this. I'd like to see how that goes in CI here (especially the job that covers old numpy and pandas versions).

If everything passes, I'd be happy to merge this 😁

jameslamb · 2026-03-14T05:48:54Z

is there anything else that needs to be done? I updated my config as well as added my local mac machine emails to my github profile.

Sorry, forgot to answer this question. Commits look great now, thanks for fixing that.

jameslamb · 2026-03-14T05:52:03Z

tests/python_package_test/test_basic.py

+    )
+
+    # confirm that the array dtypes also become the category dtypes
+    assert df["np_float"].dtype.categories.dtype == np.float32


This is failing in a few CI jobs (notably Windows jobs on AppVeyor):

FAILED tests/python_package_test/test_basic.py::test_pandas_categorical_json_serialization_works - AssertionError: assert dtype('float64') == <class 'numpy.float32'>

(build link)

It might be slightly too strict. @daguirre11 if you figure out a better pattern for this test (or notice any other issues I've introduced) please feel free to push updates here. Otherwise, I'll look at this again some time in the next few days.

removed _json_default_with_numpy

342566c

daguirre11 requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez and shiyu1994 as code owners January 31, 2026 17:23

jameslamb added the maintenance label Feb 1, 2026

jameslamb mentioned this pull request Feb 1, 2026

[python-package] expand test coverage #7031

Open

31 tasks

jameslamb requested changes Feb 1, 2026

View reviewed changes

daguirre11 closed this Feb 1, 2026

daguirre11 reopened this Feb 1, 2026

Merge branch 'master' into basic-json-default-with-numpy-test

a4e05c4

daguirre11 requested a review from jameslamb February 17, 2026 16:46

daguirre11 and others added 6 commits February 22, 2026 16:01

Merge branch 'master' into basic-json-default-with-numpy-test

ccf9e73

Merge branch 'master' into basic-json-default-with-numpy-test

f0e1217

Merge branch 'master' into basic-json-default-with-numpy-test

b48f2df

Merge branch 'master' into basic-json-default-with-numpy-test

429a26c

Merge branch 'master' into basic-json-default-with-numpy-test

044858c

add test

f5f5fed

jameslamb approved these changes Mar 14, 2026

View reviewed changes

jameslamb reviewed Mar 14, 2026

View reviewed changes

Merge branch 'master' into basic-json-default-with-numpy-test

946d878

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] removed _json_default_with_numpy private function#7145

[python-package] removed _json_default_with_numpy private function#7145
daguirre11 wants to merge 9 commits intolightgbm-org:masterfrom
daguirre11:basic-json-default-with-numpy-test

daguirre11 commented Jan 31, 2026 •

edited

Loading

Uh oh!

jameslamb left a comment •

edited

Loading

Uh oh!

daguirre11 commented Feb 17, 2026

Uh oh!

jameslamb left a comment

Uh oh!

jameslamb commented Mar 14, 2026

Uh oh!

jameslamb Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daguirre11 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameslamb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daguirre11 commented Feb 17, 2026

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

jameslamb commented Mar 14, 2026

Uh oh!

jameslamb Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daguirre11 commented Jan 31, 2026 •

edited

Loading

jameslamb left a comment •

edited

Loading