21 migrate throws error when time values have overlapping characters#22
Merged
mthomas-ketchbrook merged 16 commits intodevfrom Jan 11, 2026
Conversation
Mitigating scenarios where column names contain one another (e.g., M1 and M12)
with prior test
to align with our approach in {ffiec}
"quarter_3" and "quarter_4" doesn't address the overlapping characters issue described in #21
mthomas-ketchbrook
approved these changes
Jan 11, 2026
Contributor
mthomas-ketchbrook
left a comment
There was a problem hiding this comment.
Excellent work on this, thank you!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
When one distinct value for the
timeargument inmigrate()contains the other distinct value, the column name replacement logic substitutes values for both column names.migrate()performs the substitution for the column names after usingtidyr::pivot_wider()to create two columns for the data from thetimeargument. Column names are then renamed usinggsublogic. To remedy the issue, stricter pattern matching has been enforced within thegsublogic.Additionally, a warning has been created when a user provides a
characterdata type column to thetimeargument. In these situations,migrate()recommends that the user adjust the column to an orderedfactor. This recommendation comes ascharacterdata types are the most prone to unexpected sorting behavior as demonstrated in the following example:Details
A simplified example of the current and revised
gsublogic is provided below, usingM1andM12as the two distinct values in the column supplied to thetimeargument.Current Approach
gsub( pattern = as.character(times[1]), replacement = "start", x = times ) #> [1] "start" "start2"Updated Approach
The revised
gsublogic anchors the pattern to the end of the string. As a result, even if one distinct value is contained within the other, the anchored pattern enforces a full match to substitute.Testing
Two new tests have been created within
tests/testthat/test-migrate.R. The first evaluates whethermigrate()appropriately renamescharacterdata type columns supplied to thetimeargument. The second evaluate whether a warning is thrown if acharacterdata type column is supplied to thetimeargument.devtools::test()anddevtools::check()were both executed and reported zero errors.Documentation
The
READMEhas been updated to note thatmigrate()will accept various data types for thetimeargument, which was previously unclear. The updates also specify that a warning will be throw if acharacterdata type is provided.