Skip to content

PIP-Technical-Team/pipaux

Repository files navigation

pipaux

R-CMD-check Codecov test coverage

pipaux manages the auxiliary data used in the PIP workflow. It allows two main actions: [1] It efficiently updates the auxiliary data in the Y drive, while making sure it is in sync with the raw data in Github, and that the right dependencies are processed. It also logs the updates, so that users can inspect timing, success, and errors information. [2] It tracks changes across and within releases

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("PIP-Technical-Team/pipaux")
# library(pipaux)
devtools::load_all()

Even though pipaux has many functions, one per each auxiliary data measure, most of its features can be executed through the following key functions:

  • update_aux_measures(): to update auxiliary data measures, either one at a time or all together, in the right order of dependencies.
    • Note: use pipload::load_aux_data()to load the auxiliary data into memory. This function in {pipload} is a wrapper around the measure-specific loading functions (e.g., aux_cpi("load")).
  • compare_aux_releases(): to compare auxiliary data across releases, and detect new rows, removed rows, or changed values. Accepts one or more measures.
  • compare_aux_vintages(): to compare auxiliary data within the same release, and detect version changes. Accepts one or more measures.

Mandatory Setup (Working Release)

Before using any update or comparison functions, you must initialize the working release:

pipfun::setup_working_release(
  release  = "20260202", #for example
  identity = "TEST"
)

This populates the internal .pipaux environment with:

  • Release date (e.g. YYYYMMDD)

  • Identity (e.g., TEST, PROD)

  • Y-drive data path

  • Y-drive metadata path

  • Stamp aliases for versioned artifacts

You can inspect the active release:

get_from_auxenv("wrk_release")
get_from_auxenv("aux_data_path")

Core concepts

Although pipaux contains one aux_* function per measure, most workflows rely on these main functions:

1. Load auxiliary data (read-only)

pipload::load_aux_data(measure = "gdp")

Loads previously saved auxiliary data from the Y drive, in the release specific folder. This does not trigger an update.

2. Update one measure

aux_fun(measure = "gdp", owner = "YourGHUser")

This:

  • Checks whether the GitHub release branch is up to date

  • Detects raw data SHA changes

  • Detects formatter function code changes

  • Resolves and updates dependencies automatically

  • Saves new versions only if needed

3. Update multiple measures in dependency-aware order

update_aux_measures(
  measures = c("cpi", "metaregion"),
  owner    = "YourGHUser",
  log      = TRUE,  
  log_save = FALSE,        # Optional: save log to disk
  halt_on_dep_fail = FALSE # Continue updates even if a dependency fails
)

This:

  • Processes measures in dependency order

  • Avoids unnecessary work

  • Optionally saves a structured update log

4. Compare auxiliary data across releases

compare_aux_releases(old_release = "YYYYMMDD_ID")

These functions identify:

  • Added rows
  • Removed rows
  • Changed values

How Updates Decide Whether to Run

pipaux avoids unnecessary recomputation using multiple checks:

  • GitHub release branch vs DEV

  • Raw file SHA changes

  • aux_* function code hash changes

  • Dependency cascade detection

  • Presence and integrity of Y-drive sidecar metadata

If none of these signals change, the measure is not republished.

Versioning is handled via the stamp framework:

pipaux_set_versioning("content")    # default
pipaux_set_versioning("timestamp")  # force new version every run
pipaux_set_versioning("off")        # overwrite without versioning

Logging

Every update call generates structured log entries including:

  • Measure name
  • Timestamp
  • Success / failure
  • GitHub SHAs
  • Error stack traces (if any)

If log_save = TRUE, call pipfun::log_load()

If not saved, the most recent in-memory log can be accessed via aux_log_last().

Typical Workflow

# 1. Initialize release
pipfun::setup_working_release()

# 2. Update selected measures
update_aux_measures(
  measures = c("cpi", "gdp"),
  owner = "YourGHUser",
  log_save = TRUE
)

# 3. Load updated data
gdp <- pipload::load_aux_data("gdp")

# 4. Compare to previous release
compare_aux_releases(old_release = "YYYYMMDD_TEST")

Scope

pipaux is intended for:

  • PIP Technical Team workflows
  • Controlled release environments (TEST / PROD)
  • Server environments with Y-drive access
  • Users with valid GitHub PAT credentials

It is not intended as a general-purpose data pipeline framework.

Further Documentation

For deeper technical details (dependency graph logic, sidecar metadata structure, SHA semantics, logging internals, development diagnostics), see the project technical guide in dev/_project_notes

Summary

pipaux ensures that auxiliary data in the PIP workflow are:

  • Reproducible
  • Versioned
  • Dependency-aware
  • Provenance-tracked
  • Efficiently updated
  • Fully auditable

It is the orchestration layer between GitHub auxiliary repositories and the PIP Y-drive release system.

About

Fetch auxiliary information and organizing data (substitute of master file)

Resources

License

Stars

Watchers

Forks

Contributors