UI-Research · wcurrangroome · Feb 2, 2026 · Feb 1, 2026 · Feb 2, 2026
diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml
@@ -20,6 +20,7 @@ jobs:
       group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
     env:
       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
+      IPUMS_API_KEY: ${{ secrets.IPUMS_API_KEY }}
     permissions:
       contents: write
     steps:

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: crosswalk
 Type: Package
-Title: Simple interface to inter-temporal and inter-geography crosswalks 
+Title: streamlining inter-temporal and inter-geography crosswalking 
 Version: 0.0.0.9001
-Description: An R package providing a simple interface to access geographic crosswalks.
+Description: An R package providing a simple interface to access and apply crosswalks.
 License: MIT + file LICENSE
 Authors@R: 
     person(given = "Will", family = "Curran-Groome", email = "wcurrangroome@urban.org", role = c("aut", "cre"))

diff --git a/README.Rmd b/README.Rmd
@@ -18,11 +18,12 @@ devtools::load_all()
 
 # crosswalk
 
-An R interface to inter-geography and inter-temporal crosswalks.
+ An R package providing a simple interface to access and apply crosswalks.
 
 ## Overview
 
-This package provides a consistent API and standardized versions of crosswalks to enable consistent approaches that work across different geography and year combinations. The package also facilitates
+This package provides a consistent API and standardized versions of crosswalks to enable consistent approaches 
+that work across different geography and year combinations. The package also facilitates
 interpolation--that is, adjusting source geography/year values by their crosswalk weights and translating
 these values to the desired target geography/year--including diagnostics of the joins between source data
 and crosswalks.
@@ -37,9 +38,9 @@ The package sources crosswalks from:
 
 -   **Programmatic access**: No more manual downloads from web interfaces
 -   **Standardized output**: Consistent column names across all crosswalk sources
--   **Metadata tracking**: Full provenance stored as attributes
--   **Multi-step handling**: Automatic chaining when both geography and year change
--   **Local caching**: Reproducible workflows with cached crosswalks
+-   **Metadata tracking**: Full provenance of crosswalks stored as attributes
+-   **Crosswalk chaining**: Automatic chaining when multiple crosswalks are required
+-   **Local caching**: Reproducible workflows with locally-cached crosswalks for speed
 
 ## Installation
 
@@ -151,7 +152,8 @@ combined_data %>%
 
 ## Core Functions
 
-The package has two main functions:
+The package has two main functions, though you can also specify the needed crosswalk(s)
+directly from `crosswalk_data()` and omit the intermediate `get_crosswalk()` call.
 
 | Function | Purpose |
 |--------------------------------------|----------------------------------|
@@ -168,8 +170,7 @@ result <- get_crosswalk(
   target_geography = "zcta",
   source_year = 2010,
   target_year = 2020,
-  weight = "population"
-)
+  weight = "population")
 
 names(result)
 #> [1] "crosswalks" "plan" "message"
@@ -192,23 +193,23 @@ The list contains three elements:
 result <- get_crosswalk(
   source_geography = "tract",
   target_geography = "zcta",
-  weight = "population"
-)
+  weight = "population")
 # result$crosswalks$step_1 contains one crosswalk
 
 # Same geography, different year (NHGIS)
 result <- get_crosswalk(
   source_geography = "tract",
   target_geography = "tract",
   source_year = 2010,
-  target_year = 2020
-)
+  target_year = 2020)
 # result$crosswalks$step_1 contains one crosswalk
 ```
 
-**Multi-step crosswalks** (different geography AND different year):
+**Multi-step crosswalks** (when a single, direct crosswalk is not available):
 
-When both geography and year change, no single crosswalk source provides this directly. The package automatically plans and fetches a two-step chain:
+For some source year/geography -> target year/geography specifications do not have a crosswalk.
+In such cases, two or more crosswalks may be needed. The package automatically plans and fetches the
+required crosswalks:
 
 1.  **Step 1 (NHGIS)**: Change year, keep geography constant
 2.  **Step 2 (Geocorr)**: Change geography at target year
@@ -219,8 +220,7 @@ result <- get_crosswalk(
   target_geography = "zcta",
   source_year = 2010,
   target_year = 2020,
-  weight = "population"
-)
+  weight = "population")
 
 # Two crosswalks are returned
 names(result$crosswalks)
@@ -241,7 +241,8 @@ Each crosswalk contains standardized columns:
 | `allocation_factor_source_to_target` | Weight for interpolating values |
 | `weighting_factor` | What attribute was used (population, housing, land) |
 
-Additional columns may include `source_year`, `target_year`, `population_2020`, `housing_2020`, and `land_area_sqmi` depending on the source.
+Additional columns may include `source_year`, `target_year`, `population_2020`, `housing_2020`, 
+and `land_area_sqmi` depending on the source of the crosswalk.
 
 ### Accessing Metadata
 
@@ -257,6 +258,8 @@ names(metadata)
 ## Using `crosswalk_data()` to Interpolate Data
 
 `crosswalk_data()` applies crosswalk weights to transform your data. It automatically handles multi-step crosswalks.
+If you're in a hurry, you can omit a call to `get_crosswalk()` and specify the needed crosswalk parameters
+to `crosswalk_data()`, which will pass these to `get_crosswalk()` behind the scenes.
 
 ### Column Naming Convention
 
@@ -270,7 +273,6 @@ The function auto-detects columns based on prefixes:
 You can also specify columns explicitly via `count_columns` and `non_count_columns`. 
 All non-count variables are interpolated using weighted means, weighting by the allocation factor from the crosswalk.
 
-
 ## Supported Geography and Year Combinations
 
 ### Inter-Geography Crosswalks (Geocorr)
@@ -300,11 +302,14 @@ NHGIS provides cross-decade crosswalks with the following structure:
 **Notes:**
 - Within-decade crosswalks (e.g., 2010→2014) are not available from NHGIS
 - Block→ZCTA, Block→PUMA, etc. are only available for decennial years (1990, 2000, 2010, 2020)
-- The package automatically uses direct NHGIS crosswalks when available (e.g., `get_crosswalk(source_geography = "block", target_geography = "zcta", source_year = 2010, target_year = 2020)` returns a single-step NHGIS crosswalk)
+- The package automatically uses direct NHGIS crosswalks when available (e.g., 
+`get_crosswalk(source_geography = "block", target_geography = "zcta", source_year = 2010, target_year = 2020)` 
+returns a single-step NHGIS crosswalk)
 
 ### 2020→2022 Crosswalks (CTData)
 
-For 2020 to 2022 transformations, the package uses CT Data Collaborative crosswalks for Connecticut (where planning regions replaced counties) and identity mappings for other states (where no changes occurred).
+For 2020 to 2022 transformations, the package uses CT Data Collaborative crosswalks for Connecticut 
+(where planning regions replaced counties) and identity mappings for other states (where no changes occurred).
 
 ## API Keys
 
@@ -336,4 +341,10 @@ The intellectual credit for the underlying crosswalks belongs to the original de
 
 **For Geocorr**, a suggested citation:
 
-> Missouri Census Data Center, University of Missouri. (2022). Geocorr 2022: Geographic Correspondence Engine. Retrieved from: https://mcdc.missouri.edu/applications/geocorr2022.html
+> Missouri Census Data Center, University of Missouri. (2022). Geocorr 2022: Geographic Correspondence Engine. Retrieved from: https://mcdc.missouri.edu/applications/geocorr2022.html
+
+**For CTData**, a suggested citation (adjust for alternate source geography):
+
+> CT Data Collaborative. (2023). 2022 Census Tract Crosswalk. Retrieved from: https://github.com/CT-Data-Collaborative/2022-tract-crosswalk. 
+
+**For this package**, refer here: https://ui-research.github.io/crosswalk/authors.html#citation
diff --git a/renv/activate.R b/renv/activate.R
@@ -3,7 +3,6 @@ local({
 
   # the requested version of renv
   version <- "1.1.7"
-  attr(version, "md5") <- "dd5d60f155dadff4c88c2fc6680504b4"
   attr(version, "sha") <- NULL
 
   # the project directory
@@ -169,16 +168,6 @@ local({
     if (quiet)
       return(invisible())
 
-    # also check for config environment variables that should suppress messages
-    # https://github.com/rstudio/renv/issues/2214
-    enabled <- Sys.getenv("RENV_CONFIG_STARTUP_QUIET", unset = NA)
-    if (!is.na(enabled) && tolower(enabled) %in% c("true", "1"))
-      return(invisible())
-
-    enabled <- Sys.getenv("RENV_CONFIG_SYNCHRONIZED_CHECK", unset = NA)
-    if (!is.na(enabled) && tolower(enabled) %in% c("false", "0"))
-      return(invisible())
-
     msg <- sprintf(fmt, ...)
     cat(msg, file = stdout(), sep = if (appendLF) "\n" else "")
 
@@ -226,16 +215,6 @@ local({
     section <- header(sprintf("Bootstrapping renv %s", friendly))
     catf(section)
 
-    # try to install renv from cache
-    md5 <- attr(version, "md5", exact = TRUE)
-    if (length(md5)) {
-      pkgpath <- renv_bootstrap_find(version)
-      if (length(pkgpath) && file.exists(pkgpath)) {
-        file.copy(pkgpath, library, recursive = TRUE)
-        return(invisible())
-      }
-    }
-
     # attempt to download renv
     catf("- Downloading renv ... ", appendLF = FALSE)
     withCallingHandlers(
@@ -261,6 +240,7 @@ local({
 
     # add empty line to break up bootstrapping from normal output
     catf("")
+
     return(invisible())
   }
 
@@ -277,20 +257,12 @@ local({
     repos <- Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE", unset = NA)
     if (!is.na(repos)) {
 
-      # split on ';' if present
-      parts <- strsplit(repos, ";", fixed = TRUE)[[1L]]
-
-      # split into named repositories if present
-      idx <- regexpr("=", parts, fixed = TRUE)
-      keys <- substring(parts, 1L, idx - 1L)
-      vals <- substring(parts, idx + 1L)
-      names(vals) <- keys
+      # check for RSPM; if set, use a fallback repository for renv
+      rspm <- Sys.getenv("RSPM", unset = NA)
+      if (identical(rspm, repos))
+        repos <- c(RSPM = rspm, CRAN = cran)
 
-      # if we have a single unnamed repository, call it CRAN
-      if (length(vals) == 1L && identical(keys, ""))
-        names(vals) <- "CRAN"
-
-      return(vals)
+      return(repos)
 
     }
 
@@ -539,51 +511,6 @@ local({
 
   }
 
-  renv_bootstrap_find <- function(version) {
-
-    path <- renv_bootstrap_find_cache(version)
-    if (length(path) && file.exists(path)) {
-      catf("- Using renv %s from global package cache", version)
-      return(path)
-    }
-
-  }
-
-  renv_bootstrap_find_cache <- function(version) {
-
-    md5 <- attr(version, "md5", exact = TRUE)
-    if (is.null(md5))
-      return()
-
-    # infer path to renv cache
-    cache <- Sys.getenv("RENV_PATHS_CACHE", unset = "")
-    if (!nzchar(cache)) {
-      root <- Sys.getenv("RENV_PATHS_ROOT", unset = NA)
-      if (!is.na(root))
-        cache <- file.path(root, "cache")
-    }
-
-    if (!nzchar(cache)) {
-      tools <- asNamespace("tools")
-      if (is.function(tools$R_user_dir)) {
-        root <- tools$R_user_dir("renv", "cache")
-        cache <- file.path(root, "cache")
-      }
-    }
-
-    # start completing path to cache
-    file.path(
-      cache,
-      renv_bootstrap_cache_version(),
-      renv_bootstrap_platform_prefix(),
-      "renv",
-      version,
-      md5,
-      "renv"
-    )
-
-  }
-
   renv_bootstrap_download_tarball <- function(version) {
 
     # if the user has provided the path to a tarball via
@@ -1052,7 +979,7 @@ local({
 
   renv_bootstrap_validate_version_release <- function(version, description) {
     expected <- description[["Version"]]
-    is.character(expected) && identical(c(expected), c(version))
+    is.character(expected) && identical(expected, version)
   }
 
   renv_bootstrap_hash_text <- function(text) {
@@ -1254,18 +1181,6 @@ local({
 
   }
 
-  renv_bootstrap_cache_version <- function() {
-    # NOTE: users should normally not override the cache version;
-    # this is provided just to make testing easier
-    Sys.getenv("RENV_CACHE_VERSION", unset = "v5")
-  }
-
-  renv_bootstrap_cache_version_previous <- function() {
-    version <- renv_bootstrap_cache_version()
-    number <- as.integer(substring(version, 2L))
-    paste("v", number - 1L, sep = "")
-  }
-
   renv_json_read <- function(file = NULL, text = NULL) {
 
     jlerr <- NULL

diff --git a/vignettes/standardizing-longitudinal-data.Rmd b/vignettes/standardizing-longitudinal-data.Rmd
@@ -8,12 +8,15 @@ vignette: >
 ---
 
 ```{r, include = FALSE}
+# Only evaluate chunks if IPUMS API key is available
+has_api_key <- nchar(Sys.getenv("IPUMS_API_KEY")) > 10
+
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>",
   message = FALSE,
   echo = TRUE,
-  eval = TRUE)
+  eval = has_api_key)
 ```
 
 ## Overview
@@ -75,13 +78,11 @@ glimpse(hmda_data[["2018"]])
 
 ## Step 2: Prepare Data for Crosswalking
 
-The HMDA data includes a `tractid` column that contains the 11-digit tract GEOID.
-Let's prepare a subset of variables for crosswalking. We'll focus on a subset of variables
+We'll focus on a subset of variables for crosswalking
 (total applications by race/ethnicity and median loan amounts). We could explicitly pass the 
 variables we want to crosswalk to the appropriate parameter (`count_columns` or `non_count_columns`), 
 but it's easy (and nice practice) to prefix these variables with their unit types ("count" and "median", 
-respectively), and `crosswalk_data()` will crosswalk each appropriately by default since they have these
-standard unit prefixes in their names.
+respectively), and `crosswalk_data()` will crosswalk each appropriately by default.
 
 ```{r prepare-data, echo = FALSE}
 prepare_hmda <- function(data) {
@@ -121,7 +122,7 @@ tract_crosswalk$message
 ## Step 4: Apply the Crosswalk to 2018-2021 Data
 
 Now we apply the crosswalk to the four years of data that use 2010 tract definitions.
-We can see in the console-printed output that relatively small, though not insignificant, 
+We can see that relatively small, though not insignificant, 
 fractions of records in our source data do not join to our crosswalk. When this occurs, source
 data is effectively lost because it has no associated target geography nor allocation factor
 assigned to it.
@@ -171,8 +172,8 @@ hmda_crosswalked |>
 
 ## Result: A Panel Dataset in 2020 Tract Definitions
 We now have a single dataframe with all six years of HMDA data standardized to 2020
-tract definitions. Due to changes in tract geographies between decades, we were unable
-to accurately compare neighborhood changes over time. 
+tract definitions. Due to changes in tract geographies between decades, we were previously
+unable to accurately compare neighborhood changes over time. 
 
 Now, we have apples-to-apples measurements for tracts from 2018 through 2023.