Skip to content

Conversation

@rafdoodle
Copy link
Collaborator

@rafdoodle rafdoodle commented Jan 19, 2026

Hello Doug,

Hope all is well. After spending the last few days with Claude improving test coverage and documentation, I have prepared one final pull request for you to review chmsflow before we submit it to CRAN for the first time.

Your main tasks are to:

  • Check if information in LICENSE and DESCRIPTION are correct.
  • Check if PR Medications updates + new variable names #10 is necessary for CRAN submission and if it needs to be dealt with first.
  • Check if the is_taking_drug_class function is needed for the package, as its use in the cycles1to2_* medication functions was removed during function vectorization.
  • Check if package looks good and is working on your end (perhaps more in detail if you have the time).

Upon your approval, I will merge all these changes to main, from which I will submit the tarball (.tar.gz) to CRAN. Then, upon CRAN approval, I will create a release (branch) on GitHub for the packages's first version (0.1.0).

From my end already, devtools::check() passes with no errors, warnings, and notes, and I already confirmed with Claude that chmsflow meets the CRAN guidelines outlined for source packages.

Please let me know what you think.

Sincerely,
Rafidul

DougManuel and others added 30 commits September 18, 2025 14:30
A guide for refactoring CHMSFLOW functions to support vector operations using tidyverse patterns.
Improves code maintainability and aligns with tidyverse guidelines.
 - Establish measure-specific missing data precedence logic:
    * For demographics-based measures: tagged_na("a") takes precedence
    * Rationale: If core demographics are "not applicable", entire measure invalid
    * Mixed codes (6+7): Result is tagged_na("a") not tagged_na("b")

  - Update precedence order in both alcohol functions:
    * Check "not applicable" (6) before "missing" (7,8,9)
    * Add detailed comments explaining clinical rationale
    * Document that precedence logic should be measure-specific

  - Provide template guidance for other derived measures:
    * Demographics-based: tagged_na("a") precedence
    * Symptom/behavior-based: tagged_na("b") may take precedence
    * Decision depends on clinical logic and survey design intent
…entation, tests, and variable-details as a result
…ariables; improved readability of main README and meds qmd
Comprehensive catalog of 13 CHMS databases following Dublin Core standards:

Fully validated databases (Cycles 1-6):
- Cycle 1 (2007): Id=10263, 15 sites, 5,604 participants, ages 6-79
- Cycle 2 (2009-2011): Id=10264, 18 sites, 6,395 participants, ages 3-79
- Cycle 3 (2012-2013): Id=136652, ~5,500 participants, ages 3-79
- Cycle 4 (2014-2015): Id=148760, 16 sites, 5,794 participants, ages 3-79
  * First cycle with Hepatitis C RNA testing
- Cycle 5 (2016-2017): Id=251160, ~5,700 participants, ages 3-79
- Cycle 6 (2018-2019): Id=1195092, ages 3-79
  * Data often combined with Cycle 5 for analysis

Partially validated (Cycle 7):
- Collection: 2020-2021
- Survey ID marked with TODO for verification

Medication files (cycle1_meds through cycle6_meds):
- Reference parent cycle survey IDs
- Available through RDC

Validation process:
- URLs verified against Statistics Canada IMDB pages
- Precise collection dates confirmed from official documentation
- Sample sizes validated from data user guides
- Access restrictions updated: RDC only (no PUMF available)

Dublin Core fields included:
- title, description, creator, publisher
- subject, date, type, language
- identifier (DOI, catalogue number, SDDS)
- coverage (spatial, temporal, population)

Future work:
- Validate Cycle 7 survey ID and collection details
- Add final sample sizes when available
- Verify data user guide URLs for newer cycles
Schema documentation for recodeflow metadata structure applied to CHMS:

Schema files (inst/metadata/schemas/chms/):
- variables.yaml: Field definitions for variables.csv
  (variable, label, variableType, databaseStart, variableStart, etc.)
- variable_details.yaml: Field definitions for variable-details.csv
  (variable, recodes, categories, typeStart, typeEnd, etc.)
- chms_database_config.yaml: CHMS-specific database configuration
  (valid cycles, selection strategies, CHMS observations)

Documentation (inst/metadata/README.md):
- Distinguishes recodeflow conventions vs CHMS-specific patterns
- Explains variableStart format patterns:
  * Bracket format: [varname] for consistent names
  * Cycle-prefixed: cycle1::varname for cycle-specific names
  * Mixed format: cycle1::var1, [var2] (override + default pattern)
  * DerivedVar: DerivedVar::[var1, var2] for calculated variables
- Documents range notation for categorical recodes:
  * Integer ranges: [7,9] includes 7,8,9
  * Continuous ranges: [18.5,25) for BMI categories
  * Special values: 'else' as catch-all
- CHMS observations: Cycle 1 often used different variable names
  than Cycles 2-6 (handled via mixed format convention)

Purpose:
These schemas document the metadata structure that MockData functions
parse and validate. They are independently useful for understanding
CHMS metadata conventions and serve as reference documentation for
anyone working with variables.csv and variable-details.csv files.
…cles 1-6, as well as targeted sample size for cycle 7; corrected url for cycle3_meds and removed broken user guide links for cycles 3-6
Add Dublin Core metadata for CHMS database cycles
rafdoodle and others added 10 commits October 24, 2025 21:09
…ed mentions of validate-metadata.R in inst/metadata README
Add CHMS metadata schemas for variables and variable_details
README, explaining dependency restoration and local install process with renv and devtools.

renv remains ignored from package build, as per typical CRAN approach.
CRAN prep: Add renv, update package metadata, and debug vignettes
…most 100% test coverage; fixed NA handling of medication functions
@rafdoodle rafdoodle requested a review from DougManuel January 19, 2026 02:28
@DougManuel
Copy link
Contributor

Review summary

Nice work on test coverage, NA handling, and documentation improvements. devtools::check() passing clean is a solid milestone.

I've created separate issues for the items that need discussion or are CRAN-blocking:

P0 (CRAN-blocking)

P1 (should fix before CRAN)

Additional items (smaller, can address in this PR)

Description column empty: The description column in variables.csv exists but is blank for every variable. Worth populating with brief plain-language descriptions.

Parameter case mixing: Function parameters use inconsistent case — ANYMED2 (uppercase) in determine_hypertension() but diab_drug2 (lowercase) in determine_inclusive_diabetes(). Should pick a convention and apply consistently.

Typo in variable-details.csv: dummyVariable field for anymed2 reads amymed2_cat2_3 — missing the n. Should be anymed2_cat2_3.

Re: your checklist questions

@DougManuel
Copy link
Contributor

Follow-up: Full review and CRAN readiness assessment

Answers to your checklist

LICENSE and DESCRIPTION

  • LICENSE lists only Rafidul Islam as copyright holder, but DESCRIPTION has three authors with aut role (Rafidul, Doug, Therese). Let's confirm whether all three should be copyright holders.
  • recodeflow is in Suggests — need to confirm it's on CRAN. If not, the submission will be rejected.
  • Everything else looks good (version 0.1.0, title, citation, URLs, dependencies).

PR #10
PR #10 doesn't need to be merged first. It's mostly the AI.md style guide (discussion only, no code implementing the proposals) plus minor CSV changes that conflict with your more extensive PR #12 work. The AI.md can be added after this PR merges. We can close PR #10.

is_taking_drug_class
Keep it. The cycles1to2_* functions no longer call it (they use is_* helpers directly), but it's a useful public API for advanced users defining custom drug classes. It's exported, documented, and tested — fine to ship.

CRAN-blocking decision

All five issues (#13#17) need to be addressed before CRAN submission. The naming issues (#16, #17) would require deprecation cycles if changed after publication, so better to get them right now.

Must fix before CRAN:

Plus the smaller items from my previous comment (empty descriptions, parameter case, amymed2 typo).

Suggested sequence

  1. Agree on naming conventions (Standardize variable naming conventions before CRAN #16, Function naming uses 8+ verb prefixes with no documented taxonomy #17) — this drives everything else
  2. Rename variables and functions across codebase
  3. Fix anymed2/diab_drug2 architecture (anymed2 and diab_drug2 have self-referential metadata — breaks rec_with_table() #13)
  4. Add cycles 3-6 wrapper (No automated workflow for medication variables in cycles 3-6 (long format) #15)
  5. Clean up metadata (Role column still present in variables.csv #14, descriptions, typo)
  6. Update vignette, tests, and documentation to match

Let's schedule a discussion to align on the naming conventions — that's the key decision that unblocks the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants