-
Notifications
You must be signed in to change notification settings - Fork 0
Final CRAN prep #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Final CRAN prep #12
Conversation
A guide for refactoring CHMSFLOW functions to support vector operations using tidyverse patterns.
Improves code maintainability and aligns with tidyverse guidelines.
- Establish measure-specific missing data precedence logic:
* For demographics-based measures: tagged_na("a") takes precedence
* Rationale: If core demographics are "not applicable", entire measure invalid
* Mixed codes (6+7): Result is tagged_na("a") not tagged_na("b")
- Update precedence order in both alcohol functions:
* Check "not applicable" (6) before "missing" (7,8,9)
* Add detailed comments explaining clinical rationale
* Document that precedence logic should be measure-specific
- Provide template guidance for other derived measures:
* Demographics-based: tagged_na("a") precedence
* Symptom/behavior-based: tagged_na("b") may take precedence
* Decision depends on clinical logic and survey design intent
…entation, tests, and variable-details as a result
Vectorization of CHMS functions
…ariables; improved readability of main README and meds qmd
Comprehensive catalog of 13 CHMS databases following Dublin Core standards: Fully validated databases (Cycles 1-6): - Cycle 1 (2007): Id=10263, 15 sites, 5,604 participants, ages 6-79 - Cycle 2 (2009-2011): Id=10264, 18 sites, 6,395 participants, ages 3-79 - Cycle 3 (2012-2013): Id=136652, ~5,500 participants, ages 3-79 - Cycle 4 (2014-2015): Id=148760, 16 sites, 5,794 participants, ages 3-79 * First cycle with Hepatitis C RNA testing - Cycle 5 (2016-2017): Id=251160, ~5,700 participants, ages 3-79 - Cycle 6 (2018-2019): Id=1195092, ages 3-79 * Data often combined with Cycle 5 for analysis Partially validated (Cycle 7): - Collection: 2020-2021 - Survey ID marked with TODO for verification Medication files (cycle1_meds through cycle6_meds): - Reference parent cycle survey IDs - Available through RDC Validation process: - URLs verified against Statistics Canada IMDB pages - Precise collection dates confirmed from official documentation - Sample sizes validated from data user guides - Access restrictions updated: RDC only (no PUMF available) Dublin Core fields included: - title, description, creator, publisher - subject, date, type, language - identifier (DOI, catalogue number, SDDS) - coverage (spatial, temporal, population) Future work: - Validate Cycle 7 survey ID and collection details - Add final sample sizes when available - Verify data user guide URLs for newer cycles
Schema documentation for recodeflow metadata structure applied to CHMS: Schema files (inst/metadata/schemas/chms/): - variables.yaml: Field definitions for variables.csv (variable, label, variableType, databaseStart, variableStart, etc.) - variable_details.yaml: Field definitions for variable-details.csv (variable, recodes, categories, typeStart, typeEnd, etc.) - chms_database_config.yaml: CHMS-specific database configuration (valid cycles, selection strategies, CHMS observations) Documentation (inst/metadata/README.md): - Distinguishes recodeflow conventions vs CHMS-specific patterns - Explains variableStart format patterns: * Bracket format: [varname] for consistent names * Cycle-prefixed: cycle1::varname for cycle-specific names * Mixed format: cycle1::var1, [var2] (override + default pattern) * DerivedVar: DerivedVar::[var1, var2] for calculated variables - Documents range notation for categorical recodes: * Integer ranges: [7,9] includes 7,8,9 * Continuous ranges: [18.5,25) for BMI categories * Special values: 'else' as catch-all - CHMS observations: Cycle 1 often used different variable names than Cycles 2-6 (handled via mixed format convention) Purpose: These schemas document the metadata structure that MockData functions parse and validate. They are independently useful for understanding CHMS metadata conventions and serve as reference documentation for anyone working with variables.csv and variable-details.csv files.
…cles 1-6, as well as targeted sample size for cycle 7; corrected url for cycle3_meds and removed broken user guide links for cycles 3-6
Add Dublin Core metadata for CHMS database cycles
…ed mentions of validate-metadata.R in inst/metadata README
Add CHMS metadata schemas for variables and variable_details
README, explaining dependency restoration and local install process with renv and devtools. renv remains ignored from package build, as per typical CRAN approach.
CRAN prep: Add renv, update package metadata, and debug vignettes
…k-data-review branch
…most 100% test coverage; fixed NA handling of medication functions
Review summaryNice work on test coverage, NA handling, and documentation improvements. I've created separate issues for the items that need discussion or are CRAN-blocking: P0 (CRAN-blocking)
P1 (should fix before CRAN)
Additional items (smaller, can address in this PR)Description column empty: The Parameter case mixing: Function parameters use inconsistent case — Typo in variable-details.csv: Re: your checklist questions
|
Follow-up: Full review and CRAN readiness assessmentAnswers to your checklistLICENSE and DESCRIPTION
PR #10
CRAN-blocking decisionAll five issues (#13–#17) need to be addressed before CRAN submission. The naming issues (#16, #17) would require deprecation cycles if changed after publication, so better to get them right now. Must fix before CRAN:
Plus the smaller items from my previous comment (empty descriptions, parameter case, amymed2 typo). Suggested sequence
Let's schedule a discussion to align on the naming conventions — that's the key decision that unblocks the rest. |
Hello Doug,
Hope all is well. After spending the last few days with Claude improving test coverage and documentation, I have prepared one final pull request for you to review
chmsflowbefore we submit it to CRAN for the first time.Your main tasks are to:
LICENSEandDESCRIPTIONare correct.is_taking_drug_classfunction is needed for the package, as its use in thecycles1to2_*medication functions was removed during function vectorization.Upon your approval, I will merge all these changes to
main, from which I will submit the tarball (.tar.gz) to CRAN. Then, upon CRAN approval, I will create a release (branch) on GitHub for the packages's first version (0.1.0).From my end already,
devtools::check()passes with no errors, warnings, and notes, and I already confirmed with Claude thatchmsflowmeets the CRAN guidelines outlined for source packages.Please let me know what you think.
Sincerely,
Rafidul