This repository contains the code used for the project. The project involves data preprocessing and model building, including MOFA (Multi-Omics Factor Analysis) and Random Forest model.
The scripts are organised to follow the workflow of the project:
data_split.Rmd
Splits the dataset into two subsets:- One subset for training the MOFA model.
- One subset for training the Random Forest model.
RNA_preprocessing.Rmd
Preprocessing of the RNA dataset.mutation_preprocessing.Rmd
Preprocessing of the mutational dataset.methylation_preprocessing.Rmd
Preprocessing of the methylation dataset.cna_preprocessing.Rmd
Preprocessing of the Copy Number Alteration (CNA) dataset.
MOFA_models.Rmd- Identify the optimal MOFA model.
- Build the optimal MOFA model.
- Characterize the factors in the model.
extract_features.Rmd
Extract informative features from the MOFA model, particularly in relation to the treatment response covariate.
GSEA.Rmd
Perform Gene Set Enrichment Analysis (GSEA) using important factors identified from the MOFA model.pathway_enrichment.Rmd
Conduct pathway enrichment analysis using the informative factors identified from the MOFA model.
Random_Forest.ipynb- Build and evaluate a Random Forest model using the features extracted from the MOFA model.
- Preprocessed datasets are provided here to allow running of the later scripts as well as a construced MOFA model.
RNA_supervised_preprocessedRNA_unsupervised_preprocessed
CNA_supervised_preprocessedCNA_unsupervised_preprocessed
mutation_supervised_preprocessedmutation_unsupervised_preprocessed
methylation_supervised_preprocessedmethylation_unsupervised_preprocessed