Skip to content

Running the RCDML Code

Mauricio Ferrato edited this page Feb 23, 2022 · 1 revision

Running the RCDML Code

After the python environment is set up now it is time to run the RCDML pipeline. The pipeline has multiple run configurations that can be selected by providing the parameters that best fit your experiment needs in the configuration file.

Configuration File

Here are the configuration file options and a description of what they do:

  • run_name: Name of the directory where the results will be stored

  • result_path: Path to where the results will be stored

  • dataset_path: Path where the drug response and RNA-seq count data is stored

  • drug_name: Name of drug for which the model will classify drug response for

  • project: If given BeatAML as the option, proceeds to use the drug response and RNA-seq count data. Otherwise, use dataset and labels available in dataset_path

  • normalization: For BeatAML project, one of cpm or rpkm needs to be provided to choose the corresponding normalization of the RNA-seq data.

  • feature_selection: Feature selection techniques. Options available: [shap, pca, dge, random, swap, none]

    • shap: SHAP Value Feature Selection Technique
    • pca: Principal Component Analysis
    • dge: Differential Gene Expression Analysis (Needs to provide path to the dge files using the dge_path option)
    • random: Randomly selects features
    • none: No feature selection
    • swap: Swapping feature selection experiment. (Needs feature list paths and swapped drug name)

    Multiple options can be passed at the same time by separating the feature selection techniques using a comma (shap,pca,random,etc).

  • feature_size: Number of features/genes that will be selected by the feature selection technique

  • classifiers: Classifiers. Options available: [rf, gdb]

    • rf: Random Forest
    • gdb: Gradient Boosting (XGBOOST)

    Multiple options can be passed at the same time by separating the clasifiers using a comma (rf,gdb,etc)

  • validation: Validation modes. Options available: [cv, loo, bootstrap]

    • cv: Cross-Validation. Breaks the dataset into folds based on the number of validation_iterations
    • loo: Leave-One-Out. Iterates through the dataset, always leaving one sample for testing.
    • bootstrap: Bootstrapping. Breaks the dataset into a training and test set based on the train_test_split value
  • validation_iterations: Number of iterations used by the validation mode. Does not matter for validation mode loo

  • train_test_split: If the validation option is bootstrap, a number from 0-1 here determines the size of the train/test dataset split

  • debug: Debug verbosity. Options available: [0, 1, 2, 3, 4]

    • 0: No debug mode.
    • 1: Saves the input and output of the dataset splitting process.
    • 2: Saves the input and output of the feature selection technique process.
    • 3: Saves the input and output of the classifiers process.
    • 4: Saves the input and output of the feature counter option.

    Each debug mode option also performs the actions of the previous options.

  • feature_counter: If set to 1, creates a feature counter that shows how many times a feature/gene was selected during the pipeline run

  • dge_path: If feature selection option is dge, this is the path to where the differential gene expression files are stored

  • drug_feature_path: If feature selection option is swap, this is the path to where the feature list for the drug model is stored

  • swapped_label: If feature selection option is swap, this is the name of the drug used to swap the feature list

  • swapped_path: If feature selection option is swap, this is the path to where the feature list for the swapped drug is stored

Preset configuration files are found in the setup/config/ directory.

Running the Pipeline

After the configuration file has the desired parameters parameters, run the pipeline using the command: python main.py -f <configuration file name>

If no file is provided the pipeline will use the parameters.cfg by the default.

Clone this wiki locally