Skip to content

[WIP] Abstract Variables/Symbols#1308

Open
alanlujan91 wants to merge 10 commits intoecon-ark:mainfrom
alanlujan91:xr_states
Open

[WIP] Abstract Variables/Symbols#1308
alanlujan91 wants to merge 10 commits intoecon-ark:mainfrom
alanlujan91:xr_states

Conversation

@alanlujan91
Copy link
Member

Please ensure your pull request adheres to the following guidelines:

  • Tests for new functionality/models or Tests to reproduce the bug-fix in code.
  • Updated documentation of features that add new functionality.
  • Update CHANGELOG.md with major/minor changes.

@alanlujan91 alanlujan91 requested review from mnwhite and sbenthall July 19, 2023 18:45
@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.40% ⚠️

Comparison is base (37134b9) 72.55% compared to head (f41c944) 72.16%.
Report is 30 commits behind head on master.

❗ Current head f41c944 differs from pull request most recent head e4d2850. Consider uploading reports for the commit e4d2850 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1308      +/-   ##
==========================================
- Coverage   72.55%   72.16%   -0.40%     
==========================================
  Files          78       79       +1     
  Lines       13009    13080      +71     
==========================================
  Hits         9439     9439              
- Misses       3570     3641      +71     
Files Changed Coverage Δ
HARK/variables.py 0.00% <0.00%> (ø)

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alanlujan91
Copy link
Member Author

Having trouble setting tests up but this works locally.

@alanlujan91
Copy link
Member Author

@ingydotnet here I am starting the work of building the abstract objects

@sbenthall
Copy link
Contributor

Test failures may be due to Python versions (3.8, 3.9, etc.). A lot of these more advanced language features are relatively recent.

name: m
short_name: money
long_name: market resources
latex_repr: \mNrm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how you're imagining using the Latex representation.
But my preference would be to not include it in the PR unless you have some demonstration of how it works ready.
4 different ways to name something for a quick demo seems like a lot....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now these are filler, I want to throw any non-required keys into an attributes dictionary.

- !Action
name: c
short_name: consumption
long_name: consumption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fields optional? Can one spill over to the others as a default?
Basically, how can we make these config files lighter weight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are optional, only required is name.

long_name: market resources
latex_repr: \mNrm
- !State
name: &name stigma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the ampersands are doing here.
Maybe document that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ampersands are aliases, since one of the states is also a control and a post-state. I will document this more as I make more progress.

post_states: !PostStateSpace
variables:
- !PostState
name: a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've repeated the post_states block twice in this file?

Not sure how @mnwhite feels, but maybe we don't need to draw a firm distinction between states and post states like this.

Or the labels could be inside the variable, not part of the document structure.

Compare:

var_type_1:
   variables:
       - !VarTypeClass1
           details

var_type_2:
   variables:
       - !VarTypeClass2
           details

with

variables:
    - !VarTypeClass1
       details
    - !VarTypeClass2   
       details

The latter is same information, but fewer lines.

self.shocks = self.variables


def make_state_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method duplicated?

import yaml

import HARK.abstract.variables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case showing how this data structure could be used in practice would go a long way.

@sbenthall
Copy link
Contributor

I think this is a cool demonstration of how PyYAML can be leveraged to make model configuration files in YAML without a custom parser.

The tricky part, as we know, is function definitions.

@alanlujan91
Copy link
Member Author

Test failures may be due to Python versions (3.8, 3.9, etc.). A lot of these more advanced language features are relatively recent.

yep, this might have to be a feature for the future, just getting some initial work done on it

@alanlujan91
Copy link
Member Author

I think this is a cool demonstration of how PyYAML can be leveraged to make model configuration files in YAML without a custom parser.

The tricky part, as we know, is function definitions.

Yes! I was just talking to one of the creators of YAML who was telling me about this https://github.com/yaml/yamlscript

@sbenthall
Copy link
Contributor

Mind blown emoji.

One more thing: I'd be keen to see how you would initialize a distribution for a shock directly from YAML.



@dataclass
class Auxiliary(Variable):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Chris that 'auxiliary' is more like a macro.
I wouldn't use 'auxiliary' here, though it makes sense to have this sort of object.



@dataclass
class Variable(YAMLObject):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to initialize Variables without parsing them from a YAML file?
I.e., a pure python way to create variables?

I'm a little wary of tying the model objects too tightly to the serial format because it can make it tricky to interoperate with other python libraries.

@alanlujan91 alanlujan91 requested a review from MridulS August 2, 2023 17:44
@alanlujan91
Copy link
Member Author

@MridulS thank you for fixing the checks!

Might you be able to help me with the failing test? I'm not sure why !Parameters is not being recognized.

Co-authored-by: Mridul Seth <mail@mriduls.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new “abstract variables/symbols” layer with PyYAML-backed serialization to describe model variables/spaces via YAML fixtures.

Changes:

  • Add HARK/abstract/variables.py defining Variable/State/Action/Shock and corresponding “space” containers, plus xarray helpers.
  • Add YAML fixtures and a basic test module that loads them via yaml.safe_load.
  • Add pyyaml dependency and narrow CI to only run on Python 3.10.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
requirements/base.txt Adds PyYAML dependency for YAML tag loading/parsing.
HARK/abstract/variables.py Implements variable/space dataclasses, PyYAML tag classes, and xarray dataset builders.
HARK/abstract/tests/test_variables.py Adds tests intended to validate YAML loading of tagged objects.
HARK/abstract/tests/consindshk.yml Adds a YAML fixture for states/actions/post-states.
HARK/abstract/tests/consindshk_full.yml Adds a YAML fixture including parameters in addition to spaces.
.github/workflows/hark.yml Reduces the test matrix to only Python 3.10.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +28 to +29
for key in ["long_name", "short_name", "latex_repr"]:
self.attrs.setdefault(key, None)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable.__post_init__ currently only sets default keys in attrs, but it never copies short_name/long_name/latex_repr field values into attrs. When loading from YAML (which sets these as top-level fields), self.attrs stays empty and downstream code (e.g., State.assign_values passes self.attrs) loses this metadata. Consider syncing non-None field values into attrs (or removing the duplicate fields and storing metadata only in attrs).

Suggested change
for key in ["long_name", "short_name", "latex_repr"]:
self.attrs.setdefault(key, None)
# Synchronize metadata between top-level fields and attrs.
for key in ["long_name", "short_name", "latex_repr"]:
field_value = getattr(self, key)
# If the dataclass field is set, it is the source of truth.
if field_value is not None:
self.attrs[key] = field_value
else:
# Otherwise, if attrs already has a non-None value, mirror it back.
if key in self.attrs and self.attrs[key] is not None:
setattr(self, key, self.attrs[key])
else:
# Ensure the key exists in attrs for downstream consumers.
self.attrs.setdefault(key, None)

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +137
def assign_values(self, values):
return make_state_array(values, self.name, self.attrs)

def discretize(self, min, max, N, method):
# linear for now
self.assign_values(np.linspace(min, max, N))

Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State.assign_values returns a new Dataset but does not store it on the instance; State.discretize calls self.assign_values(...) and ignores the return value. As written, discretize has no effect on the State. Either have assign_values mutate self.array/self.domain etc., or have discretize return the created Dataset and update callers accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +263 to +270
if isinstance(values, list):
values_len = len(values)
elif isinstance(values, np.ndarray):
values_len = values.shape[0]

# Use default names and attrs only when they are not provided
names = names or [f"state{rng.integers(0, 100)}" for _ in range(values_len)]
attrs = attrs or [{}] * values_len
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_states_array leaves values_len undefined when values is neither list nor np.ndarray, which will raise an UnboundLocalError later. Add an explicit else that raises a clear TypeError/ValueError, or normalize inputs up front.

Copilot uses AI. Check for mistakes.
for value, name, attr in zip(values, names, attrs)
]

return xr.merge([states])
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return xr.merge([states]) passes a nested list (states is already a list of Datasets), which will error or produce an unexpected merge. It should merge the list directly (and likely states should be the list of datasets).

Suggested change
return xr.merge([states])
return xr.merge(states)

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +56
variables: list[Variable]
yaml_tag: str = field(default="!VariableSpace", kw_only=True)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module uses PEP 585 built-in generics like list[Variable] / list[str], which are not valid on Python 3.8 without from __future__ import annotations. The project declares requires-python = ">=3.8" in pyproject.toml, so this will break on supported versions unless you either (a) add the future import, (b) switch to typing.List[...], or (c) bump the minimum Python version everywhere (metadata + CI).

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
data = yaml.safe_load(f)

def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable data is not used.

Suggested change
data = yaml.safe_load(f)
def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
data = yaml.safe_load(f)
self.assertIsNotNone(data)
def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
self.assertIsNotNone(data)

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +19
data = yaml.safe_load(f)

def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable data is not used.

Suggested change
data = yaml.safe_load(f)
def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
data = yaml.safe_load(f)
self.assertIsNotNone(data)
def test_full(self):
with open(self.path + "consindshk_full.yml") as f:
data = yaml.safe_load(f)
self.assertIsNotNone(data)

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,19 @@
import unittest

import numpy as np
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'np' is not used.

Suggested change
import numpy as np

Copilot uses AI. Check for mistakes.
import numpy as np
import yaml

import HARK.abstract.variables
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'HARK' is not used.

Copilot uses AI. Check for mistakes.
numba>=0.56
numpy>=1.23
pandas>=1.5
pyyaml
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added dependency pyyaml is unpinned, so each install may fetch a different upstream package version, which increases supply-chain attack risk if the PyPI project is ever compromised. Because this code runs as part of your application, a malicious release of pyyaml could execute arbitrary code with your app’s privileges and access to secrets. Pin pyyaml to a specific, vetted version (or manage it via a lockfile/constraints file) and update it deliberately after review rather than tracking the moving latest release.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Stale

Development

Successfully merging this pull request may close these issues.

4 participants