pipaux manages the auxiliary data used in the PIP workflow. It allows
two main actions: [1] It efficiently updates the auxiliary data in the
Y drive, while making sure it is in sync with the raw data in Github,
and that the right dependencies are processed. It also logs the updates,
so that users can inspect timing, success, and errors information. [2]
It tracks changes across and within releases
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("PIP-Technical-Team/pipaux")# library(pipaux)
devtools::load_all()Even though pipaux has many functions, one per each auxiliary data
measure, most of its features can be executed through the following key
functions:
update_aux_measures(): to update auxiliary data measures, either one at a time or all together, in the right order of dependencies.- Note: use
pipload::load_aux_data()to load the auxiliary data into memory. This function in {pipload} is a wrapper around the measure-specific loading functions (e.g.,aux_cpi("load")).
- Note: use
compare_aux_releases(): to compare auxiliary data across releases, and detect new rows, removed rows, or changed values. Accepts one or more measures.compare_aux_vintages(): to compare auxiliary data within the same release, and detect version changes. Accepts one or more measures.
Before using any update or comparison functions, you must initialize the working release:
pipfun::setup_working_release(
release = "20260202", #for example
identity = "TEST"
)This populates the internal .pipaux environment with:
-
Release date (e.g.
YYYYMMDD) -
Identity (e.g.,
TEST,PROD) -
Y-drive data path
-
Y-drive metadata path
-
Stamp aliases for versioned artifacts
You can inspect the active release:
get_from_auxenv("wrk_release")
get_from_auxenv("aux_data_path")Although pipaux contains one aux_* function per measure, most
workflows rely on these main functions:
1. Load auxiliary data (read-only)
pipload::load_aux_data(measure = "gdp")Loads previously saved auxiliary data from the Y drive, in the release specific folder. This does not trigger an update.
2. Update one measure
aux_fun(measure = "gdp", owner = "YourGHUser")This:
-
Checks whether the GitHub release branch is up to date
-
Detects raw data SHA changes
-
Detects formatter function code changes
-
Resolves and updates dependencies automatically
-
Saves new versions only if needed
3. Update multiple measures in dependency-aware order
update_aux_measures(
measures = c("cpi", "metaregion"),
owner = "YourGHUser",
log = TRUE,
log_save = FALSE, # Optional: save log to disk
halt_on_dep_fail = FALSE # Continue updates even if a dependency fails
)This:
-
Processes measures in dependency order
-
Avoids unnecessary work
-
Optionally saves a structured update log
4. Compare auxiliary data across releases
compare_aux_releases(old_release = "YYYYMMDD_ID")These functions identify:
- Added rows
- Removed rows
- Changed values
pipaux avoids unnecessary recomputation using multiple checks:
-
GitHub release branch vs
DEV -
Raw file SHA changes
-
aux_*function code hash changes -
Dependency cascade detection
-
Presence and integrity of Y-drive sidecar metadata
If none of these signals change, the measure is not republished.
Versioning is handled via the stamp framework:
pipaux_set_versioning("content") # default
pipaux_set_versioning("timestamp") # force new version every run
pipaux_set_versioning("off") # overwrite without versioningEvery update call generates structured log entries including:
- Measure name
- Timestamp
- Success / failure
- GitHub SHAs
- Error stack traces (if any)
If log_save = TRUE, call pipfun::log_load()
If not saved, the most recent in-memory log can be accessed via
aux_log_last().
# 1. Initialize release
pipfun::setup_working_release()
# 2. Update selected measures
update_aux_measures(
measures = c("cpi", "gdp"),
owner = "YourGHUser",
log_save = TRUE
)
# 3. Load updated data
gdp <- pipload::load_aux_data("gdp")
# 4. Compare to previous release
compare_aux_releases(old_release = "YYYYMMDD_TEST")pipaux is intended for:
- PIP Technical Team workflows
- Controlled release environments (TEST / PROD)
- Server environments with Y-drive access
- Users with valid GitHub PAT credentials
It is not intended as a general-purpose data pipeline framework.
For deeper technical details (dependency graph logic, sidecar metadata
structure, SHA semantics, logging internals, development diagnostics),
see the project technical guide in dev/_project_notes
pipaux ensures that auxiliary data in the PIP workflow are:
- Reproducible
- Versioned
- Dependency-aware
- Provenance-tracked
- Efficiently updated
- Fully auditable
It is the orchestration layer between GitHub auxiliary repositories and the PIP Y-drive release system.