[WIP] Abstract Variables/Symbols#1308
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #1308 +/- ##
==========================================
- Coverage 72.55% 72.16% -0.40%
==========================================
Files 78 79 +1
Lines 13009 13080 +71
==========================================
Hits 9439 9439
- Misses 3570 3641 +71
☔ View full report in Codecov by Sentry. |
|
Having trouble setting tests up but this works locally. |
|
@ingydotnet here I am starting the work of building the abstract objects |
|
Test failures may be due to Python versions (3.8, 3.9, etc.). A lot of these more advanced language features are relatively recent. |
| name: m | ||
| short_name: money | ||
| long_name: market resources | ||
| latex_repr: \mNrm |
There was a problem hiding this comment.
I know how you're imagining using the Latex representation.
But my preference would be to not include it in the PR unless you have some demonstration of how it works ready.
4 different ways to name something for a quick demo seems like a lot....
There was a problem hiding this comment.
For now these are filler, I want to throw any non-required keys into an attributes dictionary.
| - !Action | ||
| name: c | ||
| short_name: consumption | ||
| long_name: consumption |
There was a problem hiding this comment.
Are these fields optional? Can one spill over to the others as a default?
Basically, how can we make these config files lighter weight?
There was a problem hiding this comment.
These are optional, only required is name.
| long_name: market resources | ||
| latex_repr: \mNrm | ||
| - !State | ||
| name: &name stigma |
There was a problem hiding this comment.
I don't understand what the ampersands are doing here.
Maybe document that?
There was a problem hiding this comment.
ampersands are aliases, since one of the states is also a control and a post-state. I will document this more as I make more progress.
| post_states: !PostStateSpace | ||
| variables: | ||
| - !PostState | ||
| name: a |
There was a problem hiding this comment.
Looks like you've repeated the post_states block twice in this file?
Not sure how @mnwhite feels, but maybe we don't need to draw a firm distinction between states and post states like this.
Or the labels could be inside the variable, not part of the document structure.
Compare:
var_type_1:
variables:
- !VarTypeClass1
details
var_type_2:
variables:
- !VarTypeClass2
details
with
variables:
- !VarTypeClass1
details
- !VarTypeClass2
details
The latter is same information, but fewer lines.
| self.shocks = self.variables | ||
|
|
||
|
|
||
| def make_state_array( |
There was a problem hiding this comment.
Is this method duplicated?
| import yaml | ||
|
|
||
| import HARK.abstract.variables | ||
|
|
There was a problem hiding this comment.
A test case showing how this data structure could be used in practice would go a long way.
|
I think this is a cool demonstration of how PyYAML can be leveraged to make model configuration files in YAML without a custom parser. The tricky part, as we know, is function definitions. |
yep, this might have to be a feature for the future, just getting some initial work done on it |
Yes! I was just talking to one of the creators of YAML who was telling me about this https://github.com/yaml/yamlscript |
|
Mind blown emoji. One more thing: I'd be keen to see how you would initialize a distribution for a shock directly from YAML. |
|
|
||
|
|
||
| @dataclass | ||
| class Auxiliary(Variable): |
There was a problem hiding this comment.
I agree with Chris that 'auxiliary' is more like a macro.
I wouldn't use 'auxiliary' here, though it makes sense to have this sort of object.
|
|
||
|
|
||
| @dataclass | ||
| class Variable(YAMLObject): |
There was a problem hiding this comment.
Is there a way to initialize Variables without parsing them from a YAML file?
I.e., a pure python way to create variables?
I'm a little wary of tying the model objects too tightly to the serial format because it can make it tricky to interoperate with other python libraries.
|
@MridulS thank you for fixing the checks! Might you be able to help me with the failing test? I'm not sure why |
Co-authored-by: Mridul Seth <mail@mriduls.com>
There was a problem hiding this comment.
Pull request overview
Introduces a new “abstract variables/symbols” layer with PyYAML-backed serialization to describe model variables/spaces via YAML fixtures.
Changes:
- Add
HARK/abstract/variables.pydefiningVariable/State/Action/Shockand corresponding “space” containers, plus xarray helpers. - Add YAML fixtures and a basic test module that loads them via
yaml.safe_load. - Add
pyyamldependency and narrow CI to only run on Python 3.10.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Adds PyYAML dependency for YAML tag loading/parsing. |
| HARK/abstract/variables.py | Implements variable/space dataclasses, PyYAML tag classes, and xarray dataset builders. |
| HARK/abstract/tests/test_variables.py | Adds tests intended to validate YAML loading of tagged objects. |
| HARK/abstract/tests/consindshk.yml | Adds a YAML fixture for states/actions/post-states. |
| HARK/abstract/tests/consindshk_full.yml | Adds a YAML fixture including parameters in addition to spaces. |
| .github/workflows/hark.yml | Reduces the test matrix to only Python 3.10. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for key in ["long_name", "short_name", "latex_repr"]: | ||
| self.attrs.setdefault(key, None) |
There was a problem hiding this comment.
Variable.__post_init__ currently only sets default keys in attrs, but it never copies short_name/long_name/latex_repr field values into attrs. When loading from YAML (which sets these as top-level fields), self.attrs stays empty and downstream code (e.g., State.assign_values passes self.attrs) loses this metadata. Consider syncing non-None field values into attrs (or removing the duplicate fields and storing metadata only in attrs).
| for key in ["long_name", "short_name", "latex_repr"]: | |
| self.attrs.setdefault(key, None) | |
| # Synchronize metadata between top-level fields and attrs. | |
| for key in ["long_name", "short_name", "latex_repr"]: | |
| field_value = getattr(self, key) | |
| # If the dataclass field is set, it is the source of truth. | |
| if field_value is not None: | |
| self.attrs[key] = field_value | |
| else: | |
| # Otherwise, if attrs already has a non-None value, mirror it back. | |
| if key in self.attrs and self.attrs[key] is not None: | |
| setattr(self, key, self.attrs[key]) | |
| else: | |
| # Ensure the key exists in attrs for downstream consumers. | |
| self.attrs.setdefault(key, None) |
| def assign_values(self, values): | ||
| return make_state_array(values, self.name, self.attrs) | ||
|
|
||
| def discretize(self, min, max, N, method): | ||
| # linear for now | ||
| self.assign_values(np.linspace(min, max, N)) | ||
|
|
There was a problem hiding this comment.
State.assign_values returns a new Dataset but does not store it on the instance; State.discretize calls self.assign_values(...) and ignores the return value. As written, discretize has no effect on the State. Either have assign_values mutate self.array/self.domain etc., or have discretize return the created Dataset and update callers accordingly.
| if isinstance(values, list): | ||
| values_len = len(values) | ||
| elif isinstance(values, np.ndarray): | ||
| values_len = values.shape[0] | ||
|
|
||
| # Use default names and attrs only when they are not provided | ||
| names = names or [f"state{rng.integers(0, 100)}" for _ in range(values_len)] | ||
| attrs = attrs or [{}] * values_len |
There was a problem hiding this comment.
make_states_array leaves values_len undefined when values is neither list nor np.ndarray, which will raise an UnboundLocalError later. Add an explicit else that raises a clear TypeError/ValueError, or normalize inputs up front.
| for value, name, attr in zip(values, names, attrs) | ||
| ] | ||
|
|
||
| return xr.merge([states]) |
There was a problem hiding this comment.
return xr.merge([states]) passes a nested list (states is already a list of Datasets), which will error or produce an unexpected merge. It should merge the list directly (and likely states should be the list of datasets).
| return xr.merge([states]) | |
| return xr.merge(states) |
| variables: list[Variable] | ||
| yaml_tag: str = field(default="!VariableSpace", kw_only=True) |
There was a problem hiding this comment.
This module uses PEP 585 built-in generics like list[Variable] / list[str], which are not valid on Python 3.8 without from __future__ import annotations. The project declares requires-python = ">=3.8" in pyproject.toml, so this will break on supported versions unless you either (a) add the future import, (b) switch to typing.List[...], or (c) bump the minimum Python version everywhere (metadata + CI).
| data = yaml.safe_load(f) | ||
|
|
||
| def test_full(self): | ||
| with open(self.path + "consindshk_full.yml") as f: | ||
| data = yaml.safe_load(f) |
There was a problem hiding this comment.
Variable data is not used.
| data = yaml.safe_load(f) | |
| def test_full(self): | |
| with open(self.path + "consindshk_full.yml") as f: | |
| data = yaml.safe_load(f) | |
| data = yaml.safe_load(f) | |
| self.assertIsNotNone(data) | |
| def test_full(self): | |
| with open(self.path + "consindshk_full.yml") as f: | |
| data = yaml.safe_load(f) | |
| self.assertIsNotNone(data) |
| data = yaml.safe_load(f) | ||
|
|
||
| def test_full(self): | ||
| with open(self.path + "consindshk_full.yml") as f: | ||
| data = yaml.safe_load(f) |
There was a problem hiding this comment.
Variable data is not used.
| data = yaml.safe_load(f) | |
| def test_full(self): | |
| with open(self.path + "consindshk_full.yml") as f: | |
| data = yaml.safe_load(f) | |
| data = yaml.safe_load(f) | |
| self.assertIsNotNone(data) | |
| def test_full(self): | |
| with open(self.path + "consindshk_full.yml") as f: | |
| data = yaml.safe_load(f) | |
| self.assertIsNotNone(data) |
| @@ -0,0 +1,19 @@ | |||
| import unittest | |||
|
|
|||
| import numpy as np | |||
There was a problem hiding this comment.
Import of 'np' is not used.
| import numpy as np |
| import numpy as np | ||
| import yaml | ||
|
|
||
| import HARK.abstract.variables |
There was a problem hiding this comment.
Import of 'HARK' is not used.
| numba>=0.56 | ||
| numpy>=1.23 | ||
| pandas>=1.5 | ||
| pyyaml |
There was a problem hiding this comment.
The newly added dependency pyyaml is unpinned, so each install may fetch a different upstream package version, which increases supply-chain attack risk if the PyPI project is ever compromised. Because this code runs as part of your application, a malicious release of pyyaml could execute arbitrary code with your app’s privileges and access to secrets. Pin pyyaml to a specific, vetted version (or manage it via a lockfile/constraints file) and update it deliberately after review rather than tracking the moving latest release.
Please ensure your pull request adheres to the following guidelines: