Skip to content

21 migrate throws error when time values have overlapping characters#22

Merged
mthomas-ketchbrook merged 16 commits intodevfrom
21-migrate-throws-error-when-time-values-have-overlapping-characters
Jan 11, 2026
Merged

21 migrate throws error when time values have overlapping characters#22
mthomas-ketchbrook merged 16 commits intodevfrom
21-migrate-throws-error-when-time-values-have-overlapping-characters

Conversation

@dmhhughes
Copy link
Collaborator

Overview

When one distinct value for the time argument in migrate() contains the other distinct value, the column name replacement logic substitutes values for both column names.

migrate() performs the substitution for the column names after using tidyr::pivot_wider() to create two columns for the data from the time argument. Column names are then renamed using gsub logic. To remedy the issue, stricter pattern matching has been enforced within the gsub logic.

Additionally, a warning has been created when a user provides a character data type column to the time argument. In these situations, migrate() recommends that the user adjust the column to an ordered factor. This recommendation comes as character data types are the most prone to unexpected sorting behavior as demonstrated in the following example:

column_names <- c("pre_treatment", "post_treatment")

sort(column_names)
#> [1] "post_treatment" "pre_treatment" 

Details

A simplified example of the current and revised gsub logic is provided below, using M1 and M12 as the two distinct values in the column supplied to the time argument.

Current Approach

time <- rep(c("M1", "M12"), 3)

times <- time |>
    unique() |>
    sort()

times
#> [1] "M1"  "M12"
gsub(
    pattern = as.character(times[1]),
    replacement = "start",
    x = times
)
#> [1] "start"  "start2"

Updated Approach

The revised gsub logic anchors the pattern to the end of the string. As a result, even if one distinct value is contained within the other, the anchored pattern enforces a full match to substitute.

# `paste0()` creates the anchor pattern
gsub(
    pattern = paste0(as.character(times[1]), "$"),
    replacement = "start",
    x = times
)
#> [1] "start" "M12"  

Testing

Two new tests have been created within tests/testthat/test-migrate.R. The first evaluates whether migrate() appropriately renames character data type columns supplied to the time argument. The second evaluate whether a warning is thrown if a character data type column is supplied to the time argument.

devtools::test() and devtools::check() were both executed and reported zero errors.

Documentation

The README has been updated to note that migrate() will accept various data types for the time argument, which was previously unclear. The updates also specify that a warning will be throw if a character data type is provided.

@dmhhughes dmhhughes requested a review from IvanM26 January 6, 2026 19:30
Copy link
Contributor

@mthomas-ketchbrook mthomas-ketchbrook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work on this, thank you!

@mthomas-ketchbrook mthomas-ketchbrook merged commit 5ce8413 into dev Jan 11, 2026
5 checks passed
@mthomas-ketchbrook mthomas-ketchbrook deleted the 21-migrate-throws-error-when-time-values-have-overlapping-characters branch January 11, 2026 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments