🚧 Under Construction 🚧
The RATTACA method uses SNP genotypes produced from low-coverage DNA sequencing to predict phenotypes in HS rats, then uses these trait predictions to assign for study animals with predicted extreme phenotypes. This package provides helper tools for fitting linear mixed models (LMM) for prediction, and assessing LMM performance under the goals of the RATTACA project.
Predictions are calculated by estimating marker effects using the 'RR-BLUP' method of the rrBLUP package. The primary output of this package - a set of phenotype predictions - can then be used to assign rats using the RATTACA assignment package.
RATTACA predictions are conducted in R. You will need to install R >= 4.1
with the following R packages:
- genio (>=1.1.2) to read/write PLINK genotype data
- rrBLUP (>=4.6.2) to fit prediction models
- roxygen2 (>=2.7.0) (optional: to build package documentaion)
- viridis (>=0.6.2) (for optional plotting)
- scales (>=1.2.0) (for optional plotting)
- corrplot (>=0.92) (for optional plotting)
First, simply clone the repository:
git clone git@github.com:Palmer-Lab-UCSD/rattaca.gitTo build the package documentation (which allows dynamic generation
of RATTACA documentation in R), navigate into the cloned repository, start
R, and from the R command line:
library(roxygen2)
roxygenize()Next, exit R to build the package. Navigate outside the cloned repository.
With the repo inside the current working directory, execute from the command
line:
R CMD build rattaca... which will generate a tar ball rattaca_<version_number>.tar.gz in the
current working directory.
You can install from the command line or within R. To install from the command line, assuming the RATTACA tar ball is in the current working directory, execute:
R CMD INSTALL rattaca_<version_number>.tar.gz... to install the package binary in the default location on your local machine.
Alternatively, to save within a specific directory (such as within a conda environment
library), use the -l option to set the desired parent directory for the package binary:
R CMD INSTALL -l desired/path/to/location/ rattaca_<version_number>.tar.gzTo install from R, restart R and in the command line execute:
install.packages('path/to/rattaca_<version_number>.tar.gz', 'path/to/parent/directory', repos=NULL)... ensuring repos is set to NULL to install from a local package. The parent directory path
may be excluded when saving to a default location. The downloaded tarball can be deleted after installation.
On this GitHub page, on the right-hand side, click
Releases. Choose a release version,
and click on the compressed source code link for
.tar.gz. Given that this downloaded in the
Downloads folder, you can install or build as above:
cd ~/Downloads
R CMD INSTALL [-l library_path] rattaca-<tagged_version_number>.tar.gzHere we introduce rattaca package features with a series
of examples.
Using the R command line we can generate arbitrary simulated
biallelic genotypes
library(rattaca)
q_markers <- 100
n_samples <- 210
genotypes <- sample(c(0,1), q_markers * n_samples,
replace=TRUE)
genotypes <- matrix(genotype, ncol=q_markers, nrow=n_samples)from these data we can generate a simulation closure to sample trait values given a trait heritability and genotypes
heritability <- 0.4
simulator <- rattaca::gen_sim_closure_r_sq(heritability, genotypes)
trait_data <- simulator(genotypes)Using the genotype and trait pairs, we can fit the data using
the rattaca wrapper for rrBLUP's rrBLUP::mixed.solve.
out_pars <- rattaca::fit(trait_data,
genotypes)Then by using the ls function we see that the out_pars
list contains
ls(out_pars)
[1] "beta" "beta.SE" "LL" "pearson_corr"
[5] "r_sq" "spearman_corr" "u"
[9] "u.SE" "Ve" "Vu"where the parameter definitions are as follows:
| parameter | description |
|---|---|
| beta | Fixed effect size, i.e. the intercept term |
| beta.SE | Standard error of intercept |
| LL | log-likelihood of fit |
| pearson_corr | Pearson correlation between predictions and training data |
| r_sq | Coefficient of determination, R^2, between predictions and training data |
| spearman_corr | Spearman correlation between predictions and training data |
| u | BLUP for each marker random effect |
| u.SE | Standard error BLUP for each marker random effect |
| Ve | Variance of error term |
| Vu | Variance of random effects |
Note, that fitting is computationally expensive for large amounts of fitting data. Test on your system prior to running to estimate time.
Suppose that our current working direcotry contained a directory
data for which the plink and trait data files are located,
-data/
|- genotype.bed
|- genotype.bim
|- genotype.fam
|- traits.csvThen to load the genotypes and trait data for mass where
rfid is the sample id column,
genotypes <- rattaca::load_and_prepare_plink_data("data/genotypes")
trait <- rattaca::load_and_prepare_trait_data("data/traits.csv",
"rfid", "mass")rattaca scripts can be found in the inst directory of this repository.
To fit the model to data, you'll need to have on hand:
- genotype files
- trait measurements as csv
rat_fit.Rscript
Note that the animal identifies in the genotype files must match that of the trait measurements. Let us assume that the current working directory
- rat_fit.R
- data/
|- genotype.bed
|- genotype.bim
|- genotype.fam
|- traits.csvThen to run the script
$ Rscript rat_fit.R \
--plink_genotypes_prefix 'data/genotype' \
--trait_file 'data/traits.csv' \
--trait_rat_id_colname 'rfid' \
--trait 'mass' \
--out_dir 'results'which will print the results to file results/mass.bpar.
The .bpar file contains records separated by line. Records
come in three flavors and are distinguishable by line prefex:
| Prefix | Description |
|---|---|
## |
Meta data containing key value pairs delimited by =. |
# |
Header, defines the values for each variant record, comma delimited. |
| No prefix | Variant record, comma delimited. |
The header includes
| Column Name | Description |
|---|---|
| var_id | As specified in |
| random_effect | BLUP of the random effect for variant produced by rrBLUP:mixed.solve |
| random_effect_se | Standard error of BLUP produced by rrBLUP::mixed.solve |
Please clone or fork the repository. Submit changes as a new
branch and open a Pull Request. Also, please add new unit tests
and run them prior to submitting a recommended change. Recall that
an R package can be tested by, assuming that the cloned rattaca
package directory is in your current working directory,
R CMD check rattacaThis command will:
- print several helpful messages to the standard out,
- create directory
rattaca.Rcheckwith helpful log files, - build documentation, and
- run unit-tests.
- Johnson, Benjamin B., et al. "RATTACA: Genetic predictions in Heterogeneous Stock rats offer a new tool for genetic correlation and experimental design." bioRxiv (2023): 2023-09.
- Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250-255.