From fa17d22ed3f70a9a7ba8a640ef0a081f5716e015 Mon Sep 17 00:00:00 2001 From: xuewei cao <36172337+xueweic@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:27:33 -0500 Subject: [PATCH 1/3] Update ColocBoost_Wrapper_Pipeline.Rmd --- vignettes/ColocBoost_Wrapper_Pipeline.Rmd | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd index d54f2c8..41165ed 100644 --- a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd +++ b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd @@ -22,10 +22,10 @@ This vignette demonstrates how to use the bioinformatics pipeline for ColocBoost - See more details about input data preparation in `xqtl_protocol` with [link](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html). -Step 1: Loading individual-level and summary statistics using `load_multitask_regional_data` function from multiple cohorts or datasets +# 1. Loading Data using `colocboost_analysis_pipeline` function +This function harmonizes the input data and prepares it for colocalization analysis. -Step 2: Perform ColocBoost using `colocboost_analysis_pipeline` function In this section, we introduce how to load the regional data required for the ColocBoost analysis using the `load_multitask_regional_data` function. This function loads mixed datasets for a specific region, including individual-level data (genotype, phenotype, covariate data), summary statistics @@ -84,7 +84,6 @@ xvar_cutoff = 0 imiss_cutoff = 0.9 # More advanced parameters see pecotmr::load_multitask_regional_data() - region_data_individual <- load_multitask_regional_data( region = region, genotype_list = genotype_list, @@ -143,7 +142,6 @@ n_controls = c(0, 40000) # More advanced parameters see pecotmr::load_multitask_regional_data() - region_data_sumstat <- load_multitask_regional_data( sumstat_path_list = sumstat_path_list, column_file_path_list = column_file_path_list, @@ -223,7 +221,8 @@ outputs: - **`colocboost_results`**: List of colocboost objects (with `xqtl_coloc`, `joint_gwas`, `separate_gwas`); Output of the `colocboost_analysis_pipeline` function. If the mode is not run, the corresponding element will be `NULL`. ```{r, colocboost-analysis, eval = FALSE} -# load in individual-level and sumstat data +#### Please check the example code below #### +# # load in individual-level and sumstat data region_data_combined <- load_multitask_regional_data( region = region, genotype_list = genotype_list, @@ -277,4 +276,4 @@ colocboost_plot(colocboost_results$joint_gwas) for (i in 1:length(colocboost_results$separate_gwas)) { colocboost_plot(colocboost_results$separate_gwas[[i]]) } -``` +``` \ No newline at end of file From 83f42730e16e0e2686941d1e10e39b0da0971c99 Mon Sep 17 00:00:00 2001 From: xuewei cao <36172337+xueweic@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:45:18 -0500 Subject: [PATCH 2/3] Update ColocBoost_Wrapper_Pipeline.Rmd --- vignettes/ColocBoost_Wrapper_Pipeline.Rmd | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd index 41165ed..3662aad 100644 --- a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd +++ b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd @@ -38,7 +38,8 @@ Below are the input parameters for this function for loading individual-level da ## 1.1. Loading individual-level data from multiple cohorts -inputs: +Inputs: + - **`region`**: String ; Genomic region of interest in the format of `chr:start-end` for the phenotype region you want to analyze. - **`genotype_list`**: Character vector; Paths for PLINK bed files containing genotype data (do NOT include .bed suffix). - **`phenotype_list`**: Character vector; Paths for phenotype file names. @@ -55,7 +56,8 @@ inputs: - **`xvar_cutoff`**: Numeric; Minimum genotype variance cutoff. Default is 0. - **`imiss_cutoff`**: Numeric; Maximum individual missingness cutoff. Default is 0. -outputs: +Outputs: + - **`region_data`**: List (with `individual_data`, `sumstat_data`); Output of the `load_multitask_regional_data` function. If only individual-level data is loaded, `sumstat_data` will be `NULL`. @@ -108,7 +110,8 @@ region_data_individual <- load_multitask_regional_data( ## 1.2. Loading summary statistics from multiple cohorts or datasets -inputs: +Inputs: + - **`sumstat_path_list`**: Character vector; Paths to the summary statistics. - **`column_file_path_list`**: Character vector; Paths to the column mapping files. See below for expected format. - **`LD_meta_file_path_list`**: Character vector; Paths to LD metadata files. See below for expected format. @@ -119,7 +122,8 @@ inputs: - **`n_cases`**: Integer vector; Number of cases. Set a 0 if `n_samples` is passed explicitly. If unknown, set as 0 and include `n_cases` column in the column mapping file to retrieve from the sumstat file. - **`n_controls`**: Integer vector; Number of controls. Set a 0 if `n_samples` is passed explicitly. If unknown, set as 0 and include `n_controls` column in the column mapping file to retrieve from the sumstat file. -outputs: +Outputs: + - **`region_data`**: List (with `individual_data`, `sumstat_data`); Output of the `load_multitask_regional_data` function. If only summary statistics data is loaded, `individual_data` will be `NULL`. **Summary statistics loading example** @@ -158,6 +162,7 @@ region_data_sumstat <- load_multitask_regional_data( **Expected format for column mapping file** + The column mapping file is YAML (`.yml`) with key: value pairs mapping your input column names to the standardized names expected by the loader. Required columns are `chrom`, `pos`, `A1`, and `A2`, and either `z` or `beta` and `sebeta`. Either 'n_case' and 'n_control' or 'n_samples' can be passed as part of the column mapping, but will be overwritten by the n_cases and n_controls or n_samples parameterspassed explicitly. @@ -202,7 +207,8 @@ The colocalization analysis can be run in any one of three modes, or in a combin - **`joint GWAS mode`**: Perform colocalization analysis in disease-agnostic mode on the individual-level and summary statistics data together. - **`separate GWAS mode`**: Perform colocalization analysis in disease-prioritized mode on the the individual-level data and each summary statistics dataset separately, treating each summary statistics dataset as the focal trait. -inputs: +Inputs: + - **`region_data`**: List (with `individual_data`, `sumstat_data`); Output of the `load_multitask_regional_data` function. - **`focal_trait`**: String; For xQTL-only mode, the name of the trait to perform disease-prioritized ColocBoost, from `conditions_list_individual`. If not provided, xQTL-only mode will be run without disease-prioritized mode. - **`event_filters`**: List of character vectors; Patterns for filtering events based on context names. @@ -217,7 +223,8 @@ Example: for sQTL, `list(type_pattern = ".*clu_(\\d+_[+-?]).*", valid_pattern = - **`joint_gwas`**: Logical; if TRUE, performs joint GWAS mode, mapping all individual-level and sumstat data together.Default is `FALSE`. - **`separate_gwas`**: Logical; if TRUE, runs separate GWAS mode, where each sumstat dataset is analyzed separately with all individual-level data, treating each sumstat as the focal trait in disease-prioritized mode. Default is `FALSE`. -outputs: +Outputs: + - **`colocboost_results`**: List of colocboost objects (with `xqtl_coloc`, `joint_gwas`, `separate_gwas`); Output of the `colocboost_analysis_pipeline` function. If the mode is not run, the corresponding element will be `NULL`. ```{r, colocboost-analysis, eval = FALSE} From 600d04b6f685dd18c395e2b774f3c158fabe7c04 Mon Sep 17 00:00:00 2001 From: xuewei cao <36172337+xueweic@users.noreply.github.com> Date: Thu, 6 Nov 2025 22:57:13 -0500 Subject: [PATCH 3/3] update_announcement --- vignettes/ColocBoost_Wrapper_Pipeline.Rmd | 1 + vignettes/announcements.Rmd | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd index 3662aad..714f293 100644 --- a/vignettes/ColocBoost_Wrapper_Pipeline.Rmd +++ b/vignettes/ColocBoost_Wrapper_Pipeline.Rmd @@ -21,6 +21,7 @@ This vignette demonstrates how to use the bioinformatics pipeline for ColocBoost `colocboost_pipeline` with [link](https://github.com/StatFunGen/pecotmr/blob/main/R/colocboost_pipeline.R). - See more details about input data preparation in `xqtl_protocol` with [link](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/colocboost.html). +Acknowledgements: Thanks to Kate (Kathryn) Lawrence (GitHub:@kal26) for her contributions to this vignette. # 1. Loading Data using `colocboost_analysis_pipeline` function diff --git a/vignettes/announcements.Rmd b/vignettes/announcements.Rmd index c7e9f58..08fb927 100644 --- a/vignettes/announcements.Rmd +++ b/vignettes/announcements.Rmd @@ -14,6 +14,11 @@ vignette: > - *May 2, 2025*: `colocboost` R package is available on [CRAN](https://CRAN.R-project.org/package=colocboost). ## Software updates +- `v1.0.7` Improvements to ColocBoost (check out the full details in [PR](https://github.com/StatFunGen/colocboost/pull/116)). + - Enhanced `colocboost_plot` function with flexible highlighting options and new visualization styles. + - Optimized performance and computational efficiency + - Improved documentation and examples for the wrapper pipeline + - Minor bug fixes for increased stability - `v1.0.6` Memory optimization and visualization improvements with bug fixes [CRAN](https://CRAN.R-project.org/package=colocboost). - Optimized LD-free version to reduce memory usage by eliminating large identity LD matrix generation - Enhanced `colocboost_plot` function with improved horizontal and vertical spacing labels