Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions _pages/labs/lab_7_1_R_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ R is a language and environment for statistical computing and graphics. R and it
* Excellent community support: mailing list, blogs, tutorials
* Easy to extend by writing new functions

### Lab Setup
**Lab Setup**

1. Option: Use **RStudio** (free and open-source integrated development environment for R)

Expand Down Expand Up @@ -55,19 +55,19 @@ q()
* Execute commands.
* Try tab completion.

#### Add 3 plus 3 in R.
**Add 3 plus 3 in R.**

```R
3 + 3
```

#### Calculate the squre root of 7.
**Calculate the squre root of 7.**

```R
sqrt(7)
```

#### Install and load the package ggplot2.
**Install and load the package ggplot2.**

```R
install.packages("ggplot2")
Expand All @@ -78,7 +78,7 @@ Alternative: In Rstudio, go to the "Packages" tab and click the "Istall" button.
Search in the pop-up window and click "Install".


#### Using R help.
**Using R help.**

```R
help(help)
Expand All @@ -91,7 +91,7 @@ help(sqrt)

### Excercise 1: R basics.

#### Variable assignment
**Variable assignment**

Values can be assigned names and used in subsequent operations.

Expand All @@ -103,7 +103,7 @@ sqrt(7) #calculate square root of 7; result is not stored anywhere
x <- sqrt(7) #assign result to a variable named x
```

#### Calling R functions and reading data
**Calling R functions and reading data**

We will use an example project of the most popular baby names in the United States and the United Kingdom. A cleaned and merged version of the data file is available at ***http://tutorials.iq.harvard.edu/R/Rintro/dataSets/babyNames.csv***.

Expand Down Expand Up @@ -177,7 +177,7 @@ baby.names[baby.names$Name == "jill",]
```


#### Relational and logical operators
**Relational and logical operators**

Operator | Meaning
--- | --- | ---
Expand All @@ -197,7 +197,7 @@ How many babies were born after 2003? Save the subset in a new dataframe.



#### Adding columns
**Adding columns**

Add a new column specifying the country.

Expand Down Expand Up @@ -271,7 +271,7 @@ Output:
table(baby.names$Country)
```

#### Replacing data entries
**Replacing data entries**

```R
table(baby.names$Sex)
Expand All @@ -292,7 +292,7 @@ Check the output table again.

Now that we have made some changes to our data set, we might want to save those changes to a file.

#### Save the output as a csv file
**Save the output as a csv file**

```R
getwd() # Check current working directory. Is this where you want to save your file?
Expand All @@ -306,7 +306,7 @@ How would you save other file formats?
Locate and open the file outside of R.


#### Save the output as an R object
**Save the output as an R object**

```R
save(baby.names, file="babyNames.Rdata")
Expand Down Expand Up @@ -342,7 +342,7 @@ Which are the longest names?

Which are the shortest names?

#### Summary of the whole data.frame
**Summary of the whole data.frame**

```R
summary(baby.names)
Expand All @@ -353,7 +353,7 @@ summary(baby.names)
### Excercise 6: Simple graphs.


#### Boxplots
**Boxplots**

Compare the length of baby names for boys and girls using a boxplot.

Expand All @@ -375,7 +375,7 @@ Change the layout of the plot:
* http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3


### Save plot as a pdf
**Save plot as a pdf**

```R
pdf(file="boxplot.pdf")
Expand All @@ -390,7 +390,7 @@ What about other file formats?



#### Histograms
**Histograms**

How many names were recorded for each year?

Expand Down
38 changes: 19 additions & 19 deletions _pages/labs/lab_7_metagenomic_viz.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,17 @@ load(file="Data/metaphlan_merged_MGX_species_relAb.Rdata")



#### Load the packages that we will need for the tutorial. (Install them if necessary.)
**Load the packages that we will need for the tutorial. (Install them if necessary.)**

```R
library(vegan)
library(ggplot2)
library(grid)
```

## 1. Visualization techniques: PCoAs and Biplots of microbial species data
### 1. Visualization techniques: PCoAs and Biplots of microbial species data

### Prepare the data
**Prepare the data**

* Extract the metadata

Expand All @@ -73,7 +73,7 @@ species[1:4,1:4] # check the output
```


### Ordination: PCoA with Bray-Curtis distance
#### Ordination: PCoA with Bray-Curtis distance
```R
data.bray=vegdist(species)
data.b.pcoa=cmdscale(data.bray,k=(nrow(species)-1),eig=TRUE)
Expand All @@ -94,7 +94,7 @@ p = ggplot(pcoa, aes(x=PC1, y=PC2)) + geom_point(size=4) + theme_bw()
p
```

#### Adding additional information to the plot: colours and shapes
**Adding additional information to the plot: colours and shapes**

Which metadata would be interesting to include in the plot?

Expand All @@ -111,7 +111,7 @@ pcoa$time_point = gsub("C","",metadata$Time_point)
head(pcoa) # check the data.frame
```

#### Colour by diagnosis
**Colour by diagnosis**

```R
p = ggplot(pcoa, aes(x=PC1, y=PC2, color=diagnosis)) + geom_point(size=4) + theme_bw()
Expand All @@ -131,7 +131,7 @@ p = p + guides(col=guide_legend(title="Diagnosis"))
p
```

#### Exercise: Display diagnosis as a shape instead!
**Exercise: Display diagnosis as a shape instead!**

<!-- p = ggplot(pcoa, aes(x=PC1, y=PC2, shape=diagnosis)) + geom_point(size=4) + theme_bw()
p = p + guides(shape=guide_legend(title="Diagnosis"))
Expand All @@ -145,7 +145,7 @@ p = p + scale_shape_manual(values=c(23,22,19))
p
```

#### Highlight inter-individual variation: Colour by individual
**Highlight inter-individual variation: Colour by individual**

How many participants are there and how many time points per participant?

Expand Down Expand Up @@ -188,7 +188,7 @@ For loops are not very effective in R. A better way to do this is by using sappl
?sapply
```

#### Excercise: Create the colour varialbe using sapply. Check that the resulting variable is identical to the one we just created with the for loop.
**Excercise: Create the colour varialbe using sapply. Check that the resulting variable is identical to the one we just created with the for loop.**

<!---
longitudinal_colour_v2=NA
Expand Down Expand Up @@ -240,7 +240,7 @@ dev.off()
```


#### Adding time course information for some individuals
**Adding time course information for some individuals**

Next we will connect all sample from a particular patient and add time point numbers to the plot.

Expand All @@ -265,7 +265,7 @@ p
```


#### Excercise: Add time courses for 3 more participants!
**Excercise: Add time courses for 3 more participants!**

<!---
patient2="C3001"
Expand Down Expand Up @@ -347,7 +347,7 @@ p_biplot
```


#### Adding specific species to the plot
**Adding specific species to the plot**

```R
list=c("s__Collinsella_intestinalis",
Expand Down Expand Up @@ -435,9 +435,9 @@ head(MGX_pwys[,1:2], n=30)

What does the stratification "unclassified" mean?

#### Create a list of all pathways that were detected in the metagenomic and metatranscriptomic data, respectively. How many are there? ```grep``` the pathway totals
**Create a list of all pathways that were detected in the metagenomic and metatranscriptomic data, respectively. How many are there? ```grep``` the pathway totals**

#### Metagenomic data
**Metagenomic data**

```R
MGX_pwy_lst_all = unique(gsub("\\|.*", "", MGX_pwys$Pathway))
Expand All @@ -447,7 +447,7 @@ MGX_pwy_lst = MGX_pwy_lst_all[intersect(grep("UNMAPPED", MGX_pwy_lst_all, invert
length(MGX_pwy_lst)
```

#### Metatranscriptomic data
**Metatranscriptomic data**

```R
MTX_pwy_lst_all = unique(gsub("\\|.*", "", MTX_pwys$Pathway))
Expand All @@ -458,7 +458,7 @@ length(MTX_pwy_lst)
```


### Compute alpha diversity for each pathway in each sample
**Compute alpha diversity for each pathway in each sample**

We will define a function that we can apply to the metagenomic and metatranscriptomic data:

Expand Down Expand Up @@ -561,7 +561,7 @@ rownames(MTX_a_div)= sapply(seq(1,length(MTX_pwy_lst)), function(p) as.character
nrow(MTX_a_div)
```

#Remove rows that are all "-1" and are all contributed by "unclassified"
Remove rows that are all "-1" and are all contributed by "unclassified".

```R
MTX_a_div = MTX_a_div[apply(MTX_a_div, 1, function(x) sum(x==-1)!=ncol(MTX_a_div)),]
Expand Down Expand Up @@ -602,7 +602,7 @@ summary_MTX_a_div = apply(MTX_a_div_sub, 1, function(x) mean(x[x>=0]))
```


### Alpha diversity plots
**Alpha diversity plots**

We will compare the mean alpha-diversity for each pathway on the RNA level (y-axis) and DNA level (x-axis). Each point will represents one pathway and the color will indicates the mean RNA/DNA ratio across all samples on a log scale.

Expand Down Expand Up @@ -643,7 +643,7 @@ g
dev.off()
```

### Now we want to colour the points by the log DNA/RNA ratio of each pathway.
**Now we want to colour the points by the log DNA/RNA ratio of each pathway.**

Create variable that reflects the dna/rna ratio

Expand Down