Final CRAN prep #12

rafdoodle · 2026-01-19T02:28:49Z

Hello Doug,

Hope all is well. After spending the last few days with Claude improving test coverage and documentation, I have prepared one final pull request for you to review chmsflow before we submit it to CRAN for the first time.

Your main tasks are to:

Check if information in LICENSE and DESCRIPTION are correct.
Check if PR Medications updates + new variable names #10 is necessary for CRAN submission and if it needs to be dealt with first.
Check if the is_taking_drug_class function is needed for the package, as its use in the cycles1to2_* medication functions was removed during function vectorization.
Check if package looks good and is working on your end (perhaps more in detail if you have the time).

Upon your approval, I will merge all these changes to main, from which I will submit the tarball (.tar.gz) to CRAN. Then, upon CRAN approval, I will create a release (branch) on GitHub for the packages's first version (0.1.0).

From my end already, devtools::check() passes with no errors, warnings, and notes, and I already confirmed with Claude that chmsflow meets the CRAN guidelines outlined for source packages.

Please let me know what you think.

Sincerely,
Rafidul

A guide for refactoring CHMSFLOW functions to support vector operations using tidyverse patterns.

…erences

Improves code maintainability and aligns with tidyverse guidelines.

- Establish measure-specific missing data precedence logic: * For demographics-based measures: tagged_na("a") takes precedence * Rationale: If core demographics are "not applicable", entire measure invalid * Mixed codes (6+7): Result is tagged_na("a") not tagged_na("b") - Update precedence order in both alcohol functions: * Check "not applicable" (6) before "missing" (7,8,9) * Add detailed comments explaining clinical rationale * Document that precedence logic should be measure-specific - Provide template guidance for other derived measures: * Demographics-based: tagged_na("a") precedence * Symptom/behavior-based: tagged_na("b") may take precedence * Decision depends on clinical logic and survey design intent

…entation, tests, and variable-details as a result

…to2_* functions

Vectorization of CHMS functions

…ariables; improved readability of main README and meds qmd

Comprehensive catalog of 13 CHMS databases following Dublin Core standards: Fully validated databases (Cycles 1-6): - Cycle 1 (2007): Id=10263, 15 sites, 5,604 participants, ages 6-79 - Cycle 2 (2009-2011): Id=10264, 18 sites, 6,395 participants, ages 3-79 - Cycle 3 (2012-2013): Id=136652, ~5,500 participants, ages 3-79 - Cycle 4 (2014-2015): Id=148760, 16 sites, 5,794 participants, ages 3-79 * First cycle with Hepatitis C RNA testing - Cycle 5 (2016-2017): Id=251160, ~5,700 participants, ages 3-79 - Cycle 6 (2018-2019): Id=1195092, ages 3-79 * Data often combined with Cycle 5 for analysis Partially validated (Cycle 7): - Collection: 2020-2021 - Survey ID marked with TODO for verification Medication files (cycle1_meds through cycle6_meds): - Reference parent cycle survey IDs - Available through RDC Validation process: - URLs verified against Statistics Canada IMDB pages - Precise collection dates confirmed from official documentation - Sample sizes validated from data user guides - Access restrictions updated: RDC only (no PUMF available) Dublin Core fields included: - title, description, creator, publisher - subject, date, type, language - identifier (DOI, catalogue number, SDDS) - coverage (spatial, temporal, population) Future work: - Validate Cycle 7 survey ID and collection details - Add final sample sizes when available - Verify data user guide URLs for newer cycles

Schema documentation for recodeflow metadata structure applied to CHMS: Schema files (inst/metadata/schemas/chms/): - variables.yaml: Field definitions for variables.csv (variable, label, variableType, databaseStart, variableStart, etc.) - variable_details.yaml: Field definitions for variable-details.csv (variable, recodes, categories, typeStart, typeEnd, etc.) - chms_database_config.yaml: CHMS-specific database configuration (valid cycles, selection strategies, CHMS observations) Documentation (inst/metadata/README.md): - Distinguishes recodeflow conventions vs CHMS-specific patterns - Explains variableStart format patterns: * Bracket format: [varname] for consistent names * Cycle-prefixed: cycle1::varname for cycle-specific names * Mixed format: cycle1::var1, [var2] (override + default pattern) * DerivedVar: DerivedVar::[var1, var2] for calculated variables - Documents range notation for categorical recodes: * Integer ranges: [7,9] includes 7,8,9 * Continuous ranges: [18.5,25) for BMI categories * Special values: 'else' as catch-all - CHMS observations: Cycle 1 often used different variable names than Cycles 2-6 (handled via mixed format convention) Purpose: These schemas document the metadata structure that MockData functions parse and validate. They are independently useful for understanding CHMS metadata conventions and serve as reference documentation for anyone working with variables.csv and variable-details.csv files.

… checks

…cles 1-6, as well as targeted sample size for cycle 7; corrected url for cycle3_meds and removed broken user guide links for cycles 3-6

Add Dublin Core metadata for CHMS database cycles

…ed mentions of validate-metadata.R in inst/metadata README

Add CHMS metadata schemas for variables and variable_details

README, explaining dependency restoration and local install process with renv and devtools. renv remains ignored from package build, as per typical CRAN approach.

CRAN prep: Add renv, update package metadata, and debug vignettes

…k-data-review branch

…R error persists

…most 100% test coverage; fixed NA handling of medication functions

DougManuel · 2026-01-31T16:31:09Z

Review summary

Nice work on test coverage, NA handling, and documentation improvements. devtools::check() passing clean is a solid milestone.

I've created separate issues for the items that need discussion or are CRAN-blocking:

P0 (CRAN-blocking)

anymed2 and diab_drug2 have self-referential metadata — breaks rec_with_table() #13 — anymed2/diab_drug2 self-referential metadata breaks rec_with_table()
Role column still present in variables.csv #14 — Role column still present in variables.csv

P1 (should fix before CRAN)

No automated workflow for medication variables in cycles 3-6 (long format) #15 — No automated workflow for cycles 3-6 medication variables
Standardize variable naming conventions before CRAN #16 — Standardize variable naming conventions before CRAN
Function naming uses 8+ verb prefixes with no documented taxonomy #17 — Function naming uses 8+ verb prefixes with no documented taxonomy

Additional items (smaller, can address in this PR)

Description column empty: The description column in variables.csv exists but is blank for every variable. Worth populating with brief plain-language descriptions.

Parameter case mixing: Function parameters use inconsistent case — ANYMED2 (uppercase) in determine_hypertension() but diab_drug2 (lowercase) in determine_inclusive_diabetes(). Should pick a convention and apply consistently.

Typo in variable-details.csv: dummyVariable field for anymed2 reads amymed2_cat2_3 — missing the n. Should be anymed2_cat2_3.

Re: your checklist questions

LICENSE and DESCRIPTION: Will review separately.
PR Medications updates + new variable names #10: Yes, the naming conventions in PR Medications updates + new variable names #10 should be resolved before CRAN (see Standardize variable naming conventions before CRAN #16, Function naming uses 8+ verb prefixes with no documented taxonomy #17). Renaming after publication requires deprecation cycles.
is_taking_drug_class: Will review whether it's still needed given the cycles1to2_* refactoring.

DougManuel · 2026-01-31T16:42:29Z

Follow-up: Full review and CRAN readiness assessment

Answers to your checklist

LICENSE and DESCRIPTION

LICENSE lists only Rafidul Islam as copyright holder, but DESCRIPTION has three authors with aut role (Rafidul, Doug, Therese). Let's confirm whether all three should be copyright holders.
recodeflow is in Suggests — need to confirm it's on CRAN. If not, the submission will be rejected.
Everything else looks good (version 0.1.0, title, citation, URLs, dependencies).

PR #10
PR #10 doesn't need to be merged first. It's mostly the AI.md style guide (discussion only, no code implementing the proposals) plus minor CSV changes that conflict with your more extensive PR #12 work. The AI.md can be added after this PR merges. We can close PR #10.

is_taking_drug_class
Keep it. The cycles1to2_* functions no longer call it (they use is_* helpers directly), but it's a useful public API for advanced users defining custom drug classes. It's exported, documented, and tested — fine to ship.

CRAN-blocking decision

All five issues (#13–#17) need to be addressed before CRAN submission. The naming issues (#16, #17) would require deprecation cycles if changed after publication, so better to get them right now.

Must fix before CRAN:

anymed2 and diab_drug2 have self-referential metadata — breaks rec_with_table() #13 — anymed2/diab_drug2 self-referential metadata
Role column still present in variables.csv #14 — Role column in variables.csv
No automated workflow for medication variables in cycles 3-6 (long format) #15 — Cycles 3-6 medication workflow automation
Standardize variable naming conventions before CRAN #16 — Variable naming conventions (snake_case, consistent suffixes)
Function naming uses 8+ verb prefixes with no documented taxonomy #17 — Function naming taxonomy (reduce verb prefixes, document conventions)

Plus the smaller items from my previous comment (empty descriptions, parameter case, amymed2 typo).

Suggested sequence

Agree on naming conventions (Standardize variable naming conventions before CRAN #16, Function naming uses 8+ verb prefixes with no documented taxonomy #17) — this drives everything else
Rename variables and functions across codebase
Fix anymed2/diab_drug2 architecture (anymed2 and diab_drug2 have self-referential metadata — breaks rec_with_table() #13)
Add cycles 3-6 wrapper (No automated workflow for medication variables in cycles 3-6 (long format) #15)
Clean up metadata (Role column still present in variables.csv #14, descriptions, typo)
Update vignette, tests, and documentation to match

Let's schedule a discussion to align on the naming conventions — that's the key decision that unblocks the rest.

DougManuel and others added 30 commits September 18, 2025 14:30

Introduce vectorized tidyverse refactoring guide

021fef5

A guide for refactoring CHMSFLOW functions to support vector operations using tidyverse patterns.

Vectorized all chmsflow functions

8779a49

Style code (GHA)

d7895f6

Updates to function documentation with clinical context and cross-ref…

66565da

…erences

Style code (GHA)

0a14727

Merge branch 'main' into vector-tidyverse

ad97ec5

Standardizes case_when default handling

a8f82c5

Improves code maintainability and aligns with tidyverse guidelines.

Added missing data to test data, tests, and documentation examples.

2a6c8c6

Styling updates

246de25

Minor update to .Rbuildignore

e06c21f

Drafted specific coded missing data handling for all functions

38cc924

Finalized missing data code handling for all functions; updated docum…

895dca4

…entation, tests, and variable-details as a result

Rectified CMD failures by trimming example lengths and fixing cycles1…

f6e79e9

…to2_* functions

Merge pull request #4 from Big-Life-Lab/vector-tidyverse

682022c

Vectorization of CHMS functions

Added dummy data for all six cycles and ensured it included all raw v…

b7dd25d

…ariables; improved readability of main README and meds qmd

Style code (GHA)

f6b39ab

Delete man/calculate_Hhld_Income.Rd

f483416

Delete man/determine_CVD_Family_History.Rd

f00f051

Delete man/determine_CVD_Personal_History.Rd

039a9f7

Fixing faulty Rd files and running GH actions on dev branch for now

cfd17b2

Adding all data to pkgdown

3598240

Clean final commit for CRAN prep fixes

32135cb

Style code (GHA)

5993482

Set eval = FALSE for chunks involving rec_with_table() to shorten CMD…

bc7c8ce

… checks

Added correct sample size, # of collection sites, and provinces to cy…

3e6f8b0

…cles 1-6, as well as targeted sample size for cycle 7; corrected url for cycle3_meds and removed broken user guide links for cycles 3-6

Added proper link to cycle 7

4dc6ce4

Merge pull request #9 from Big-Life-Lab/database-metadata

22c3132

Add Dublin Core metadata for CHMS database cycles

rafdoodle and others added 10 commits October 24, 2025 21:09

Obtained documentation folder contents from cchsflow-dev and eliminat…

720c807

…ed mentions of validate-metadata.R in inst/metadata README

Merge pull request #8 from Big-Life-Lab/metadata-schemas

db52e31

Add CHMS metadata schemas for variables and variable_details

Updates env setup for package reproducibility

f1bd365

README, explaining dependency restoration and local install process with renv and devtools. renv remains ignored from package build, as per typical CRAN approach.

Merge pull request #6 from Big-Life-Lab/cran-prep-fixes

8370646

CRAN prep: Add renv, update package metadata, and debug vignettes

Bringing corrected dummy data and worksheets from recently pruned moc…

81f8ac5

…k-data-review branch

Edited workflows

8f872ee

Updated lock file to ensure devtools::check() works, but quarto TMPDI…

f6888c4

…R error persists

Added more edge-case tests, restructured test files, and jumped to al…

c3fc17a

…most 100% test coverage; fixed NA handling of medication functions

Cleaned documentation and test grouping structure

88e6199

Updated LICENSE

2878cd7

rafdoodle requested a review from DougManuel January 19, 2026 02:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final CRAN prep #12

Final CRAN prep #12

Uh oh!

rafdoodle commented Jan 19, 2026 •

edited

Loading

Uh oh!

DougManuel commented Jan 31, 2026

Uh oh!

DougManuel commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Final CRAN prep #12

Are you sure you want to change the base?

Final CRAN prep #12

Uh oh!

Conversation

rafdoodle commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DougManuel commented Jan 31, 2026

Review summary

Additional items (smaller, can address in this PR)

Re: your checklist questions

Uh oh!

DougManuel commented Jan 31, 2026

Follow-up: Full review and CRAN readiness assessment

Answers to your checklist

CRAN-blocking decision

Suggested sequence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rafdoodle commented Jan 19, 2026 •

edited

Loading