Fix DiscreteDistributionLabeled: labels parameter and from_dataset initialization by Copilot · Pull Request #1717 · econ-ark/HARK

Copilot · 2026-02-05T00:29:43Z

DiscreteDistributionLabeled had two design issues: the documented labels parameter in expected() was passed through to user functions causing TypeError, and from_dataset() created incomplete objects missing essential attributes.

Changes

Fix `labels` parameter in `expected()`

Extract labels from kwargs as a reserved parameter instead of passing to user function
Default labels=True for dict-based indexing, labels=False for raw array indexing

Fix `from_dataset()` to create valid distributions

Filter variables to only include those with "atom" dimension
Extract and stack atoms when shapes are compatible
For incompatible shapes (mixed dimension structures), use valid placeholder based on pmv length
Initialize pmv, seed, limit, _rng properly

Improve `expected()` dispatch logic

Use xarray operations when dataset has dimensions beyond "atom"
Rename internal variable to requires_xarray_ops for clarity

Example

from HARK.distributions import expected, Normal, DiscreteDistributionLabeled

gamma = DiscreteDistributionLabeled.from_unlabeled(
    Normal(mu=0, sigma=1).discretize(N=7), var_names=["gamma"]
)

# Previously: TypeError: <lambda>() got an unexpected keyword argument 'labels'
# Now works:
expected(func=lambda x: x["gamma"], dist=gamma, labels=True)

# Raw array indexing
expected(func=lambda x: x[0], dist=gamma, labels=False)

Original prompt

improve design of DiscreteDistributionLabeled

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

… and improve consistency Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

…support Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

… incompatible shapes Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

…cretedistributionlabeled-design

…nd documentation - Added warnings for random seed generation in DiscreteFrozenDistribution. - Improved the __repr__ method in DiscreteDistribution for better output. - Refined error messages in from_dataset methods of DiscreteDistributionLabeled. - Updated tests to validate new functionality and ensure accuracy in expectations.

Copilot

Pull request overview

This PR fixes DiscreteDistributionLabeled expectation behavior and improves construction from xarray inputs so labeled discrete distributions can be safely created and used with expected()/draw().

Changes:

Treat labels as a reserved kwarg in DiscreteDistributionLabeled.expected() and improve dispatch between numpy vs xarray-based expectation paths.
Rework DiscreteDistributionLabeled.from_dataset() to initialize required distribution attributes (pmv, atoms, limit, RNG/seed).
Extend distribution tests to cover labels behavior, from_dataset() initialization, and multidimensional xarray expectation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`HARK/distributions/discrete.py`	Updates `DiscreteDistributionLabeled.from_dataset()` and `expected()` dispatch; adds xarray-weighted expectation helper.
`HARK/distributions/utils.py`	Tweaks `calc_expectation()` for labeled distributions to ignore non-atom variables.
`tests/test_distribution.py`	Adds tests for `labels` kwarg, `from_dataset()` correctness, and multidimensional expectation.
`tests/ConsumptionSaving/test_ConsAggShockModel.py`	Updates a numeric assertion (slope) to reflect changed behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T18:14:58Z

HARK/distributions/discrete.py

        ldd.probability = pmf

+        # Extract pmv from probability DataArray
+        ldd.pmv = np.asarray(pmf.values)
+


from_dataset() documents that pmf must have dimension 'atom', but it only validates pmf’s type. Please also validate that 'atom' is in pmf.dims (and ideally pmf.ndim==1), and that pmf.sizes['atom'] matches the dataset’s 'atom' length when present, to avoid constructing inconsistent distributions.

Copilot · 2026-02-26T18:14:59Z

HARK/distributions/discrete.py

+                var_arrays = [ldd.dataset[var].values for var in vars_with_atom]
+                # Check if all arrays have the same shape (required for np.stack)
+                shapes = [arr.shape for arr in var_arrays]
+                if len(set(shapes)) == 1:
+                    ldd.atoms = np.atleast_2d(np.stack(var_arrays, axis=0))
+                else:


from_dataset() uses .values directly when stacking variables into atoms, but DiscreteDistribution assumes the last axis of atoms corresponds to 'atom'. If a DataArray’s dims are ordered like ('atom','grid') instead of ('grid','atom'), this will build atoms with the wrong axis semantics and break expected()/draw(). Consider transposing each variable so 'atom' is last before extracting values.

Copilot · 2026-02-26T18:14:59Z

HARK/distributions/discrete.py

+                    raise ValueError(
+                        f"from_dataset(): variables with 'atom' dimension have "
+                        f"incompatible shapes ({dict(zip(vars_with_atom, shapes))}). "
+                        f"Cannot construct a valid distribution with mixed-shape "
+                        f"atoms. Ensure all variables have the same shape."


The PR description mentions using a placeholder strategy for incompatible atom shapes, but from_dataset() currently raises ValueError when shapes differ. Either implement the placeholder behavior or update the PR description/docstring to reflect that mixed-shape atom variables are unsupported.

Copilot · 2026-02-26T18:14:59Z

HARK/distributions/discrete.py

+        if isinstance(f_query, xr.Dataset):
+            return f_query.weighted(self.probability).mean("atom")
+        elif isinstance(f_query, xr.DataArray):
+            return f_query.weighted(self.probability).mean("atom")
+        elif isinstance(f_query, dict):
+            ds = xr.Dataset(f_query)


_weighted_mean_of() will raise for xr.DataArray outputs that don’t have an 'atom' dimension (e.g., if the user function already reduced over 'atom'). Consider checking whether 'atom' is in f_query.dims before calling .mean('atom'), and either return f_query unchanged or raise a clearer error about required output dims.

Suggested change

if isinstance(f_query, xr.Dataset):

return f_query.weighted(self.probability).mean("atom")

elif isinstance(f_query, xr.DataArray):

return f_query.weighted(self.probability).mean("atom")

elif isinstance(f_query, dict):

ds = xr.Dataset(f_query)

if isinstance(f_query, xr.Dataset):

if "atom" not in f_query.dims:

raise ValueError(

"expected() function must return an xr.Dataset with an "

"'atom' dimension to compute a probability-weighted mean; "

f"got dimensions {tuple(f_query.dims)} instead."

)

return f_query.weighted(self.probability).mean("atom")

elif isinstance(f_query, xr.DataArray):

if "atom" not in f_query.dims:

raise ValueError(

"expected() function must return an xr.DataArray with an "

"'atom' dimension to compute a probability-weighted mean; "

f"got dimensions {tuple(f_query.dims)} instead."

)

return f_query.weighted(self.probability).mean("atom")

elif isinstance(f_query, dict):

ds = xr.Dataset(f_query)

if "atom" not in ds.dims:

raise ValueError(

"expected() function must return data with an 'atom' "

"dimension to compute a probability-weighted mean; "

f"got dimensions {tuple(ds.dims)} instead."

)

Copilot · 2026-02-26T18:15:00Z

HARK/distributions/utils.py

+                key: float(dstn.variables[key][i])
+                for key in dstn.variables.keys()
+                if "atom" in dstn.dataset[key].dims
            }
            f_query.append(func(temp_dict, *args, **kwargs))


In the labeled-dstn branch, calc_expectation() indexes with dstn.variables[key][i] and coerces to float. This breaks or mis-indexes when a variable has dims beyond 'atom' (e.g. ('grid','atom')) because [i] selects along the first dimension and float() fails for non-scalars. Use explicit xarray indexing along the atom dimension (e.g., isel(atom=i)) and avoid forcing float so functions can receive arrays when appropriate.

Copilot · 2026-02-26T18:15:00Z

tests/test_distribution.py

+        # expected() with kwargs returns an xr.Dataset; extract scalar
+        if hasattr(expected_result, "to_array"):
+            expected_result = float(expected_result.to_array().values.item())


This test only unwraps xr.Dataset results, but expected(my_func, F, z=3.0) returns an xr.DataArray in the kwargs/xarray path (not a Dataset). If xarray’s DataArray-to-float conversion is not supported in the test environment, assertAlmostEqual will fail. Consider also unwrapping 0-d DataArrays (e.g., via .item()) before comparing.

Suggested change

# expected() with kwargs returns an xr.Dataset; extract scalar

if hasattr(expected_result, "to_array"):

expected_result = float(expected_result.to_array().values.item())

# expected() with kwargs may return an xarray Dataset or DataArray; extract scalar

if isinstance(expected_result, xr.Dataset):

expected_result = float(expected_result.to_array().values.item())

elif isinstance(expected_result, xr.DataArray):

expected_result = float(expected_result.values.item())

Initial plan

c892c94

Copilot AI assigned Copilot and alanlujan91 Feb 5, 2026

Copilot started work on behalf of alanlujan91 February 5, 2026 00:30 View session

Copilot AI and others added 2 commits February 5, 2026 00:36

Fix DiscreteDistributionLabeled.expected() labels parameter bug (#1487)…

6f79d83

… and improve consistency Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Improve docstring for labels parameter in expected() method

914a4cc

Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve design of DiscreteDistributionLabeled~~ Fix DiscreteDistributionLabeled.expected() labels parameter and improve consistency Feb 5, 2026

Copilot AI requested a review from alanlujan91 February 5, 2026 00:42

Copilot finished work on behalf of alanlujan91 February 5, 2026 00:42

Copilot started work on behalf of alanlujan91 February 5, 2026 00:49 View session

Copilot AI and others added 2 commits February 5, 2026 00:57

Fix from_dataset to set atoms for valid distributions with multi-dim …

1eb5915

…support Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Address code review: rename variable and use named constants in test

f395114

Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Copilot AI changed the title ~~Fix DiscreteDistributionLabeled.expected() labels parameter and improve consistency~~ Fix DiscreteDistributionLabeled: labels parameter and from_dataset initialization Feb 5, 2026

Copilot finished work on behalf of alanlujan91 February 5, 2026 01:00

Copilot started work on behalf of alanlujan91 February 5, 2026 01:03 View session

Properly fix from_dataset: filter by atom dim and use placeholder for…

f245038

… incompatible shapes Co-authored-by: alanlujan91 <5382704+alanlujan91@users.noreply.github.com>

Copilot finished work on behalf of alanlujan91 February 5, 2026 01:11

alanlujan91 added 2 commits February 26, 2026 11:21

Merge remote-tracking branch 'upstream/main' into copilot/improve-dis…

819916a

…cretedistributionlabeled-design

alanlujan91 marked this pull request as ready for review February 26, 2026 18:09

Copilot AI review requested due to automatic review settings February 26, 2026 18:09

Copilot started reviewing on behalf of alanlujan91 February 26, 2026 18:10 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DiscreteDistributionLabeled: labels parameter and from_dataset initialization#1717

Fix DiscreteDistributionLabeled: labels parameter and from_dataset initialization#1717
Copilot wants to merge 8 commits intomainfrom
copilot/improve-discretedistributionlabeled-design

Copilot AI commented Feb 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Fix labels parameter in expected()

Fix from_dataset() to create valid distributions

Improve expected() dispatch logic

Example

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 5, 2026 •

edited

Loading

Fix `labels` parameter in `expected()`

Fix `from_dataset()` to create valid distributions

Improve `expected()` dispatch logic