-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
149 lines (110 loc) · 5.65 KB
/
README.Rmd
File metadata and controls
149 lines (110 loc) · 5.65 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(reticulate)
use_condaenv(condaenv="paramio_test", conda = "/home/matbreotten/Downloads/ENTER/bin/python")
```
<!--<img src="man/figures/logo.png" align="right" height=140/> -->
# 🎛️ Paramio
<!-- badges: start -->
[](https://github.com/matbmeijer/paramio/actions/workflows/tests.yaml)
[](https://opensource.org/licenses/MIT)
<!-- badges: end -->
**Paramio** is a light-weight package with a simple objective:
> **Define project parameters only once**
It is common for batch execution task workflows to depend on parameters/config files with dynamic characteristics. For example, input and output paths might vary depending on the environment the job is executed (e.g. `dev` vs `pred` vs `prod`).
Additionally, complex task workflows contain multiple times a variety of parameter files, dictionaries, and simple Python elements, whose parameters need to be updated dynamically. **Paramio's** objective is to offer a simple solution when dealing with these habitual circumstances in workflow projects.
## Features
1. **Centralize** the **definition** of **dynamic parameters** in a single object.
2. **Recursively update** dynamic parameters defined in the `f-string` format `"{__dynamic__parameter__}"` recursively.
3. Support for **any** kind of common **Python object** (`dict`, `list`, `tuple` & `str`).
4. Ignores other objects, which cannot be updated (e.g. `numpy` arrays).
5. Contrary to `f-string` annotation, it does not raise a `KeyError` if a dynamic parameter is not defined. This is especially useful if some dynamic parameters need to be defined at different moments of the execution (for example if they depend on the run task results).
6. Paramio is a lightweight **no-dependencies** library intended to keep projects' dependencies lean.
## Installation
You can install **Paramio** directly from Github following this `pip` command:
``` bash
pip install git+https://github.com/matbmeijer/paramio.git
```
## Example
#### Example parameter file
Let's see a basic example showing how to use **Paramio**. Imagine having a `parameters.yaml` file with all the project parameters as the following. The file could be in any common config file format (e.g. yaml, toml, json, etc.), the objective is to exemplify a realistic use case. An important aspect here is that the **dynamic variables** are defined with `f-string` formatting syntax:
```{yaml dict}
project_parameters:
env: "{env}"
s3_bucket: "{bucket}"
group:
task:
path: "{bucket}/{group}/{task}/{experiment}.snappy.parquet"
```
So evaluating the `parameters.yaml` file we have the dynamic variables:
- `"{env}"`
- `"{bucket}"`
- `"{group}"`
- `"{task}"`
- `"{experiment}"`
#### Load parameter file
We load now the `parameters.yaml` file with the usual PyYAML library to have the parameters available as a Python dictionary (`dict`). Again, the file format does not matter, it's only to depict a common process loading project parameter files:
```{python load_yaml, eval=FALSE, echo=TRUE}
# Dependencies to load yaml file from project package
import yaml
import pkg_resources
# Imaginary loading method
resource_dir = pkg_resources.resource_filename("resources", "data_preparation")
yaml_parameters_path = f"{resource_dir}/parameters.yaml"
with open(yaml_parameters_path) as stream:
parameters_file = yaml.safe_load(stream)
```
```{python really_load_dict, eval=TRUE, echo=FALSE}
parameters_file = {
"env": "{env}",
"s3_bucket": "{bucket}",
"group": {"task":
"{bucket}/{group}/{task}/{experiment}.snappy.parquet"
},
}
```
Having loaded the yaml file as dictionary, let's look at it's structure:
```{python print_dict}
print(parameters_file)
```
#### Apply **Paramio**
Now it's time to apply **Paramio**, which will update all the parameters in the `parameters_file` object recursively. Notice how the variable `{experiment}` is not set, yet - contrary to `f-string` annotation - **Paramio** does not throw a `KeyError` when applying the `Paramio().parameterize()` method:
```{python example}
from paramio import Paramio
# Set parameters once
project_parameters = Paramio(
env="dev",
bucket="enterprise_dwh_global",
group="extract",
task="read_origins"
)
# Parameterize the parameters dictionary
updated_parameters_file = project_parameters.parameterize(parameters_file)
# Notice how experiment, which is not defined in Paramio, stays the same
updated_parameters_file
```
#### Update **Paramio** paramaters
Imagine the `{experiment}` parameter depends on execution runtime results, and is added along the process. New parameters can be added (or deleted) later. Let's showcase how to add the `experiment` parameter, and notice how the new parameter dictionary `parameters_file_v2` changes:
```{python example2}
#Add parameter for experiment
project_parameters.add(experiment="1234")
# Parameterize parameter dictionary
parameters_file_v2 = project_parameters.parameterize(parameters_file)
# notice how now experiment is defined
parameters_file_v2
```
## Code of Conduct
Please note that the Paramio project is released with a
[Contributor Code of Conduct](https://github.com/matbmeijer/paramio/blob/main/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
## License
[MIT © Matthias Brenninkmeijer](https://github.com/matbmeijer/paramio/blob/main/LICENSE)