This repository contains statistical code and data related to published manuscripts and book chapters. The goal is to increase transparency and reproducibility in research.
Reference:
File:
seamless-design-functions.Rfunctions to conduct the seamless design simulationsseamless-design-simulation.Rcode to generate simulation results for scenario with 5 dose levels and the MTD was dose 4seamless-design-simulation-2.Rcode to generate simulation results for scenario with 3 dose levels and the MTD was dose 3seamless-design-simulation-3.Rcode to generate simulation results for scenario with 3 dose levels and the MTD was dose 2seamless-design-simulation-4.Rcode to generate simulation results for scenario with 3 dose levels and the MTD was dose 1
Reference:
Code for the guided example and the simulation studies, for both single and competing events.
File:
single-event-guided-example.Rcode for the single event guided example data and analysissingle_example_dat.rdasingle simulated dataset for the single event guided examplesingle_example_ipcw_dat.rdasingle simulated dataset for the single event guided example, in long format with ipc weightssingle-event-functions.Rfunctions used in both the single event guided example and simulation studysingle-event-simulation-study.Rcode to generate simulation study results for single event settingcompeting-risks-functions.Rcompeting-risks-guided-example.Rcompeting-risks-simulation-study.R
Reference:
Mi, J., Tendulkar, R. D., Sittenfeld, S. M. C., Patil, S., & Zabor, E. C. (2025). Combining Missing Data Imputation and Internal Validation in Clinical Risk Prediction Models. Statistics in medicine, 44(18-19), e70203. https://doi.org/10.1002/sim.70203
Code to simulate and analyze the single synthetic dataset used in the guided example.
File:
sim-guided-example-data.Rcode to generate single synthetic dataset with missing values used in guided exampledat0.rdasingle synthetic dataset with missing values used in guided example, generated bysim-guided-exmaple-data.Rand used inanalyze-guided-example.Ranalyze-guided-example.Rcode to bootstrap, impute, and obtain performance measures on the single synthetic dataset used in the guided examplesimulation-functions.Rcollection of functions to run a simulation studysimulation-studies.Rcode to run simulation studies included in the paper
Reference:
Zabor, E. C. (2025). Cancer Survival: Analysis and Reporting. In A. D. Singh & B. E. Damato (Eds.), Clinical Ophthalmic Oncology (Fourth ed., pp. 11-17). Switzerland: Springer.
Synthetic data file, R code, and resulting Quarto report associated with all results included in the book chapter are available:
Files:
uveal_survival_data.csvsynthetic dataset in csv formatuveal_survival_data.rdssynthetic dataset in rds formatcancer_survival_report.qmdR Quarto file containing R code to produce all results presented in the chaptercancer_survival_report.docxWord report rendered from the associated .qmd file
Reference:
Zabor EC, Kaizer AM, Pennell NA, Hobbs BP. Optimal predictive probability designs for randomized biomarker-guided oncology trials. Front Oncol. 2022;12:955056. Published 2022 Dec 6. doi:10.3389/fonc.2022.955056
Data, data generating scripts, and analysis scripts for the three designs are available:
- Pooled control arm design
data\1-pooled-generate-data.Rscript to generate the simulated trial data for the pooled control arm designdata\p-sim-dat.rdaresulting simulated pooled control arm trial dataanalysis\2-pooled-apply-ppseq.Rscript to run the analysis for the pooled control arm designanalysis\p-ppseq-res.rdaanalysis results to obtain operating characteristics for the pooled control arm design
- Stratified control arm design
data\1-stratified-generate-data.Rscript to generate the simulated trial data for the stratified control arm designdata\s-sim-dat.rdaresulting simulated stratified control arm trial dataanalysis\2-stratified-apply-ppseq.Rscript to run the analysis for the stratified control arm designanalysis\s-ppseq-res.rdaanalysis results to obtain operating characteristics for the stratified control arm design
- Enrichment design
analysis\1-enrichment-apply-ppseq.Rscript to run the analysis for the enrichment designanalysis\e-ppseq-res.rdaanalysis results to obtain operating characteristics for the enrichment design
Notes:
- these files were organized in an elaborate folder structure with a top-level R project and all filepaths used
here::hererelative to the R project. Please alter the filepaths accordingly - no data file or data generating script is included with the enrichment design since data from the pooled control group design were utilized for stage 1 and data from the stratified control group design were utilized for stage 2
Reference:
Eaton AA, Zabor EC. Analysis of composite endpoints with component-wise censoring in the presence of differential visit schedules. Stat Med. 2022 Apr 30;41(9):1599-1612. doi: 10.1002/sim.9312. Epub 2022 Jan 18. PMID: 35043427.
Files:
fn-create-single-dataset.Rfunction to simulate a single interval censored datasetfn-fit-models.Rfunction to fit each of the models included in the paperfn-tidy-ic_sp.Rhelper function to tidy the results from icenReg::ic_sp() functionprogram1-run-simulation.Rprogram to run a simulation for a given scenario and summarize results
Instructions:
- Download all files to the same directory
- Open
program1-run-simulation.R - Install the
cwcenspackage in R fromremotes::install_github("anneae/cwcens") - Either set your working directory or alter the file paths for the 3 function files in the section "Load functions"
- Change the parameters of interest in the section "Generate many datasets", including possibly the number of simulated datasets and the seed
- In the section "Prepare data to summarize" you will need to change the truth to appropriately match the setting you are examining. In the example the HR for both recurrence and death were set to 1.5 so the true log HR in all cases is log(1.5)
- Run all code in the file to fit the models and summarize the results with a boxplot and table
Reference:
Zabor EC, Kane MJ, Roychoudhury S, Nie L, Hobbs BP. Bayesian basket trial design with false-discovery rate control. Clin Trials. 2022 Feb 7:17407745211073624. doi: 10.1177/17407745211073624. Epub ahead of print. PMID: 35128970.
This paper contains three distinct parts:
-
Model calibration
heterogeneous_response_sim_fn.RA function to run the basket trial with different design parametersheterogeneous_response_sim_do.RA function that implementsheterogeneous_response_sim_fn.Rfor the design parameters of interestheterogeneous_response_sim_res.rdaThe simulation results fromheterogeneous_response_sim_do.Rheterogeneous_response_sim_report.RmdR Markdown report to summarize the results inheterogeneous_response_sim_res.rda
-
Case study
neratinib_mem_test.RThe program to generate simulation results based on the neratinib case studynerat_basket_0.25.rdaandnerat_basket_0.5.rdaare two files of simulation results generated byneratinib_mem_test.Rfor different prior probabilities of exchangeabilityneratinib_mem_results.RmdR Markdown report to summarize the results innerat_basket_0.25.rdaandnerat_basket_0.5.rda
-
Trial operating characteristics
do_single_sim.RA function to generate MEM and frequentist results for baskets with given sample sizes and true response ratesscenario1-alt0.RExecutes the global alternative scenarioscenario1-alt0.rdaResults fromscenario1-alt0.Rscenario1-alt1.RExecutes alternative 1 scenarioscenario1-alt1.rdaResults fromscenario1-alt1.Rscenario1-alt2.RExecutes alternative 2 scenarioscenario1-alt2.rdaResults fromscenario1-alt2.Rscenario1-alt3.RExecutes alternative 3 scenarioscenario1-alt3.rdaResults fromscenario1-alt3.Rscenario1-null.RExecutes the global null scenarioscenario1-null.rdaResults fromscenario1-null.Roperating-chars-results.RGets summary dataframes of the results from thescenario1-****.rdafilesscenario1-summary.rdaSummary results fromoperating-chars-results.Roperating-chars-results-report.RmdR Markdown report to summarize the results inscenario1-summary.rda
Please note that these files were organized in an elaborate folder structure with a top-level R project and all filepaths used here::here relative to the R project. Please alter the filepaths accordingly.
Reference:
Zabor EC, Seshan VE, Wang S, Begg CB. Validity of a method for identifying disease subtypes that are etiologically heterogeneous. Stat Methods Med Res. 2021 Sep;30(9):2045-2056. doi: 10.1177/09622802211032704. Epub 2021 Jul 28. PMID: 34319833.
Files:
eh_cluster_sim_fun.Rcontains the primary function for conducting a clustering simulationrun_eh_cluster_sim_fun.Ris where you can set the simulation parameters to desired values and run the simulation
Instructions:
- Download both files to the same directory
- Install the
riskclustrpackage from CRAN usinginstall.packages("riskclustr"). Note that this line of code is commented out ineh_cluster_sim_fun.Ras it only needs to be run once. - Change the filepath in
setwd()inrun_eh_cluster_sim.Rto point to the directory where the files were downloaded - Set simulation parameters in
run_eh_cluster_sim.Rto desired values - Use
set.seed()to set a seed. Note that a variety of seeds were used to produce results in the manuscript. - Run
run_eh_cluster_sim.Rto produce results. See notes at bottom of code file for details on results structure.