forked from gchoonoo/HNSCC_Notebook
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathworkflow_hpv_patients.Rmd
More file actions
219 lines (136 loc) · 7.65 KB
/
workflow_hpv_patients.Rmd
File metadata and controls
219 lines (136 loc) · 7.65 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
title: "Light/Dark Pathways Workflow for HPV +/- Patients"
author: "Ted Laderas"
date: "6/6/2019"
output: html_notebook
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Installing the packageDir package
The majority of the analysis code is contained in a package called `packageDir` authored by Sam Higgins.
To install, use the following code:
```{r eval=FALSE}
install.packages("remotes")
remotes::install_github("biodev/packageDir")
```
## Reproducing HNSCC notebook
This notebook and repo contains everything needed to reproduce the light/dark pathway analysis in Choonoo, et al.
```{r}
library(here)
library("packageDir")
library("roxygen2")
library("DT")
library("dplyr")
```
## Processing Mutation Files
Before we do the analysis, we need to generate the two mutation files for our two different analyses: HPV positive patients, and HPV negative patients.
```{r eval=FALSE}
hpv_patients <- read.csv("data/HPV_Annotation_MB.csv")
#show breakdown of patients
table(hpv_patients$FINAL_HPV_CALL)
hpv_positive_patients <- hpv_patients %>% filter(FINAL_HPV_CALL == "Highest Confidence Positive") %>% pull(TCGA_BARCODE) %>% as.character()
hpv_negative_patients <- hpv_patients %>% filter(FINAL_HPV_CALL == "High Confidence Negative") %>% pull(TCGA_BARCODE) %>% as.character()
```
```{r eval=FALSE}
maf_file <- read.delim(here::here("data/TCGA.HNSC.mutect.84c7a87a-9dcc-48fb-bd69-ba9d6e6f3ca2.DR-7.0.somatic_cleaned.maf"))
maf_positive_patients <- maf_file %>% filter(TCGA_Patient_Barcode %in% hpv_positive_patients)
nrow(maf_positive_patients)
write.table(maf_positive_patients, here::here("data/TCGA.HNSC.HPV-positive.maf"), row.names=FALSE, sep="\t")
maf_negative_patients <- maf_file %>% filter(TCGA_Patient_Barcode %in% hpv_negative_patients)
nrow(maf_negative_patients)
write.table(maf_negative_patients, here::here("data/TCGA.HNSC.HPV-negative.maf"), row.names=FALSE, sep="\t")
```
## Processing Pathway Information
This step takes a little bit of time, so we've cached the results in `data/path_detail.rds`. Make `eval=TRUE` to run this part of the workflow.
```{r eval=FALSE}
path_detail <- getDefaultPaths("reference_data/paths/ReactomePathways 2015 02 17 13 46 25 2019.06.10 18.48.35.txt")
saveRDS(path_detail, file="data/path_detail.rds")
```
## Running Overlap analysis
A number of settings options have to be set for this analysis.
### Somatic Mutation Settings for HPV Positive Analysis
```{r}
path_detail <- readRDS("data/path_detail.rds")
settings <- getBasicSettings()
#modify the settings object
settings$somatic_mutation_aberration_summary$interactive <- "n"
settings$somatic_mutation_aberration_summary$`Analyze pathways for individual members of the cohort? (y or n) `<- "n"
settings$somatic_mutation_aberration_summary$`Select a .maf file containing the data set to be analyzed.` <- "data/TCGA.HNSC.HPV-positive.maf"
settings$somatic_mutation_aberration_summary$`Would you like to include PolyPhen analysis results in this analysis? (y/n) `<- "n"
settings$somatic_mutation_aberration_summary$`Have manual gene symbol corrections already been conducted? (y/n)` <- "y"
settings$somatic_mutation_aberration_summary$`Analyze pathways for individual members of the cohort? (y or n) ` <- "n"
settings$somatic_mutation_aberration_summary$`Have manual gene symbol corrections already been conducted? (y/n)` <- "y"
settings$somatic_mutation_aberration_summary$`
Please enter the row numbers of the variant types you would like to analyze (sepparated by a space).
` <- "\nFrame_Shift_Del; Frame_Shift_Ins; In_Frame_Del; In_Frame_Ins; Missense_Mutation; Nonsense_Mutation; Nonstop_Mutation; Splice_Site\n"
settings$somatic_mutation_aberration_summary$mutation_type <- c("Frame_Shift_Del", "Frame_Shift_Ins", "In_Frame_Del", "In_Frame_Ins", "Missense_Mutation", "Nonsense_Mutation", "Nonstop_Mutation", "Splice_Site")
settings$somatic_mutation_aberration_summary$`Would you like to filter out hypermutators?
If yes, please enter a mutation count threshold.
If no just press enter n ` <- "n"
settings$somatic_mutation_aberration_summary$`Use special path significance analysis settings for this data type? (y/n)` <- "n"
```
# Process Functional Drug Screen Settings
```{r}
settings$functional_drug_screen_summary$`Please select a file with a drug screen results data set
` <- "data/Targetome_Level123_8_7_17.txt"
settings$functional_drug_screen_summary$interactive <- "n"
settings$functional_drug_screen_summary$`Have manual symbol corrections been performed yet for the current data set? (y/n)` <- "y"
settings$functional_drug_screen_summary$`Analyze pathways for individual members of the cohort? (y or n) ` <- "n"
settings$functional_drug_screen_summary$gene_stat <- "g"
settings$functional_drug_screen_summary$`
To analyze drug screen panel coverage (for a panel that has or has not been run), enter p
To process drug screen result set enter d
To save an HTML summary of the results enter h
To exit drug screen interface, enter q
` <- "p"
settings$functional_drug_screen_summary[['Enter \\"g\\" to examine coverage using a set of gene names.\nEnter \\"d\\" to examine coverage using drug names, along with a drug target matrix: ']] <- "g"
settings$functional_drug_screen_summary$`Please select a file` <- "data/Targetome_Level123_8_7_17.txt"
settings$functional_drug_screen_summary$`Please type in the name of the column with the gene symbols: ` <- "targets"
settings$functional_drug_screen_summary$`Have manual gene symbol corrections already been made? (y/n)` <- "y"
```
## Building the study with the settings
```{r}
study = getStudyObject(study.name="hnscc_hpv_positive", geneIdentifierType="HUGO", path_detail = path_detail, settings = settings)
```
## Running the Analysis
`loadBasicArms()` loads the data from the specified locations.
`autoRunFromSettings()` runs all of the mutation processing and overlap analysis code.
```{r}
study <- loadBasicArms(study)
study <- autoRunFromSettings(study)
saveStudy(study)
```
## Inspecting Output, HPV Positive Patients
Output is generated in `output/study_hnscc_hpv_positive/results/overlap_analysis`. The relevant files are:
`Aberrationally enriched, containing drug targets.txt` - Light pathway file
```{r}
lightPaths <- read.delim("output/study_hnscc_hpv_positive/results/overlap_analysis/Aberrationally enriched, containing drug targets.txt", sep="\t")
DT::datatable(lightPaths)
```
`Aberration enriched, not drug targeted.txt` - Dark pathway file
```{r}
darkPaths <- read.delim("output/study_hnscc_hpv_positive/results/overlap_analysis/Aberration enriched, not drug targeted.txt", sep = "\t")
DT::datatable(darkPaths)
```
# Running HPV negative patients
We can reuse the settings object, specifying our HPV negative patients mutation file:
```{r}
settings$somatic_mutation_aberration_summary$`Select a .maf file containing the data set to be analyzed.` <- "data/TCGA.HNSC.HPV-negative.maf"
study = getStudyObject(study.name="hnscc_hpv_negative", geneIdentifierType="HUGO", path_detail = path_detail, settings = settings)
study <- loadBasicArms(study)
study <- autoRunFromSettings(study)
saveStudy(study)
```
## Inspecting the output, HPV-Negative
Output is generated in `output/study_hnscc_hpv_negative/results/overlap_analysis`. The relevant files are:
`Aberrationally enriched, containing drug targets.txt` - Light pathway file
```{r}
lightPaths <- read.delim("output/study_hnscc_hpv_negative/results/overlap_analysis/Aberrationally enriched, containing drug targets.txt", sep="\t")
DT::datatable(lightPaths)
```
`Aberration enriched, not drug targeted.txt` - Dark pathway file
```{r}
darkPaths <- read.delim("output/study_hnscc_hpv_negative/results/overlap_analysis/Aberration enriched, not drug targeted.txt", sep = "\t")
DT::datatable(darkPaths)
```