-
Notifications
You must be signed in to change notification settings - Fork 1
Ancestry Prediction Tool #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rnmitchell
wants to merge
38
commits into
main
Choose a base branch
from
ancestry
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
9a348c2
Add ancestry prediction options/text to shiny app interface
rnmitchell 8caf6bf
Building in 1000 G genotype data for ancestry SNPs
rnmitchell 57c1664
began adding ancestry prediction to run_workflow script
rnmitchell 5dee69a
began integrating ancestry prediction
rnmitchell 6aa6cf1
update run_workflow
rnmitchell db3847c
update .gitignore
rnmitchell 15ebe59
ancestry prediction running correctly for unconditioned analyses
rnmitchell 8f0e653
merge master
rnmitchell 5508300
ancestry prediction for conditioned analyses
rnmitchell 2e0bca8
fix PCA plot title, add ancestry prediction step to config file settings
rnmitchell 5d16bfd
update test
rnmitchell f48c1bc
testing ancestry pred with all snps
rnmitchell 7fdda0a
3D PCA plots
rnmitchell 355debc
merge main
rnmitchell 2f1e3e7
merge main
rnmitchell 8be5bf6
updated with 3D plotting
rnmitchell 418a8c4
merge main
rnmitchell 3ed7429
option to use either ancestry SNPs or all SNPs for PCA
rnmitchell 53d088f
updated shiny app for multiple features with PCA plots
rnmitchell a4ac85b
including necessary data
rnmitchell a1d6246
data.R updated with included data in package
rnmitchell 625332e
updated description/news with new version #
rnmitchell 705b0ec
centroid analysis
rnmitchell 9c38c51
fixed bug with loading AF
rnmitchell 2d170d9
added superpopulation AF datasets
rnmitchell 41c2ca0
added line breaks to pop up messages
rnmitchell f367f6e
removed unnecessary data; cleaned up scripts
rnmitchell a845c4e
begin adding tests for ancestry
rnmitchell 4f093b8
added tests
rnmitchell a8e1c58
updated config
rnmitchell 0ff2530
updated with test
rnmitchell 12998e7
readthedocs
rnmitchell b9cbe33
updated readme
rnmitchell 3e61cf1
remove readthedocs
rnmitchell e953da1
merge main
rnmitchell d2fc98c
updated docs
rnmitchell d948b6f
removed hard coded path
rnmitchell b95d031
update scripts to pass checks
rnmitchell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| ^.*\.Rproj$ | ||
| ^\.Rproj\.user$ | ||
| ^README\.Rmd$ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,3 +4,6 @@ | |
| .Ruserdata | ||
| .DS_Store | ||
| inst/doc | ||
| .RDataTmp | ||
| docs/_build/html/.buildinfo | ||
| .github | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| # ------------------------------------------------------------------------------------------------- | ||
| # Copyright (c) 2024, DHS. | ||
| # | ||
| # This file is part of MixDeR and is licensed under the BSD license: see LICENSE. | ||
| # | ||
| # This software was prepared for the Department of Homeland Security (DHS) by the Battelle National | ||
| # Biodefense Institute, LLC (BNBI) as part of contract HSHQDC-15-C-00064 to manage and operate the | ||
| # National Biodefense Analysis and Countermeasures Center (NBACC), a Federally Funded Research and | ||
| # Development Center. | ||
| # ------------------------------------------------------------------------------------------------- | ||
|
|
||
| #' Title Ancestry prediction using PCA | ||
| #' | ||
| #' @param report inferred genotypes | ||
| #' @param path write path | ||
| #' @param id sample ID | ||
| #' @param analysis_type mixure deconvolution type (conditioned vs. unconditioned) | ||
| #' @param groups How to color PCA plots (superpopulations and/or subpopulations) | ||
| #' | ||
| #' @import kgp | ||
| #' @import plotly | ||
| #' | ||
| #' @return NA | ||
| #' @export | ||
| #' | ||
| ancestry_prediction = function(report, path, id, analysis_type, contrib_status, testsnps, groups) { | ||
| if (testsnps == "All Autosomal SNPs") { | ||
| plotid="AllSNPs" | ||
| geno=mixder::ancestry_1000G_allsamples | ||
| } else { | ||
| plotid="AncestrySNPsOnly" | ||
| geno=mixder::ancestrysnps_1000G_allsamples | ||
| } | ||
| ncols=ncol(geno) | ||
| geno_filt=geno[,c(7:ncols)] | ||
| snps = data.frame("snp_id"=colnames(geno_filt)) | ||
| snps = snps %>% | ||
| separate(.data$snp_id, c("rsid", "ref_allele"), remove=F) | ||
| snps$order = seq(1:nrow(snps)) | ||
| merged_alleles = merge(snps, report, by="rsid", all.x=T) %>% | ||
| arrange(order) | ||
| ## count alleles | ||
| merged_alleles$num_alt = ifelse(merged_alleles$Allele1==merged_alleles$ref_allele & merged_alleles$Allele2==merged_alleles$ref_allele, 2, ifelse(merged_alleles$Allele1==merged_alleles$ref_allele | merged_alleles$Allele2==merged_alleles$ref_allele, 1, 0)) | ||
|
|
||
| ## re-format to match 1000G samples | ||
| formatted_sample = merged_alleles %>% | ||
| select(.data$snp_id, .data$num_alt) %>% | ||
| pivot_wider(names_from=.data$snp_id, values_from=.data$num_alt) | ||
|
|
||
| ## add unknown to 1000G genotypes | ||
| geno_filt_unk = rbind(geno_filt, formatted_sample) | ||
|
|
||
| message("Running PCA<br/>") | ||
| ## remove any SNPs with NA values (in unknown sample) | ||
| betaRedNAOmit <- geno_filt_unk %>% | ||
| select_if(~ !any(is.na(.))) | ||
|
|
||
| ##perform PCA | ||
| pcaRed <- stats::prcomp(betaRedNAOmit, center=TRUE, scale=FALSE) | ||
|
|
||
| ## create data table of PCs | ||
| PCs = data.frame(pcaRed$x) | ||
|
|
||
| ## add unknown to ancestry and genotype IDs | ||
| geno_unk = geno %>% | ||
| add_row(IID="Unk") | ||
| ## merge genotypes with ancestry info; need to preserve order to match to PCA data | ||
| geno_ancestry=merge(geno_unk, mixder::ancestry_colors, by.x="IID", by.y="id") | ||
|
|
||
| ## add ancestry info to PC data | ||
| newcol=ncols+1 | ||
| newcol2=ncols+4 | ||
| PCs_anc = cbind(geno_ancestry[,c(newcol:newcol2)], data.frame(PCs[,c(1:10)])) | ||
|
|
||
|
|
||
| centroids(groups, PCs_anc, glue("{path}/PCA_plots"), glue("{id}_{contrib_status}_{analysis_type}_{plotid}")) | ||
|
|
||
| dir.create(file.path(path, "PCA_plots"), showWarnings = FALSE, recursive=TRUE) | ||
|
|
||
| if ("Superpopulations (AFR/AMR/EAS/EUR/SAS Only)" %in% groups) { | ||
| pal = unique(geno_ancestry$superpop_color) | ||
| pal = setNames(pal, unique(geno_ancestry$reg)) | ||
|
|
||
| fig = plot_ly(PCs_anc, x = ~PC1, y = ~PC2, z = ~PC3, color = ~reg, colors=pal, size=10) | ||
| fig = fig %>% add_markers() | ||
| fig = fig %>% layout(scene = list(xaxis = list(title = 'PC1'), | ||
| yaxis = list(title = 'PC2'), | ||
| zaxis = list(title = 'PC3')), | ||
| title=list(text=glue("{ncol(betaRedNAOmit)} SNPs; {id} {contrib_status} {analysis_type} Superpopulations"))) | ||
|
|
||
| htmlwidgets::saveWidget(as_widget(fig), glue("{path}/PCA_plots/{id}_{contrib_status}_{analysis_type}_{plotid}_superpop_3D_PCAPlot.html")) | ||
| } | ||
| if ("Subpopulations" %in% groups) { | ||
| pal_sub = unique(geno_ancestry$color) | ||
| pal_sub = setNames(pal_sub, unique(geno_ancestry$population)) | ||
|
|
||
| fig_sub = plot_ly(PCs_anc, x = ~PC1, y = ~PC2, z = ~PC3, color = ~population, colors=pal_sub, size=10) | ||
| fig_sub = fig_sub %>% add_markers() | ||
| fig_sub = fig_sub %>% layout(scene = list(xaxis = list(title = 'PC1'), | ||
| yaxis = list(title = 'PC2'), | ||
| zaxis = list(title = 'PC3')), | ||
| title=list(text=glue("{ncol(betaRedNAOmit)} SNPs; {id} {contrib_status} {analysis_type} Subpopulations"))) | ||
|
|
||
| htmlwidgets::saveWidget(as_widget(fig_sub), glue("{path}/PCA_plots/{id}_{contrib_status}_{analysis_type}_{plotid}_subpopulations_3D_PCAPlot.html")) | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code autoformatting could give a more consistent style in these files. Something to consider.