Skip to content

PJMerlo/MachineLearning_PhD

Repository files navigation

MachineLearning_PhD

This repository contains a set of analyses using supervised and unsupervised machine learning to process biological data

CAP (CANONICAL ANALYSIS OF PRINCIPAL COORDINATES): a method that allows a constrained ordination of species abundance data, based on a particular dissimilarity measure. # performed with abundance_df.csv file #

UPGMA (Unweighted Pair Grouping Method with Arithmetic-mean): one of the most common Hierarchical clustering algorithms used in computational biology. It defines the dissimilarity between clusters as their average dissimilarity (hence its name).

Random Forest: is an ML algorithm used for classification and regression tasks. It creates multiple decision trees during training and combines their predictions to make a final prediction (ensemble modelling). Each decision tree in the forest is trained on a random subset of the training data and features, which helps to reduce overfitting and improve generalization. Then, when making predictions, each tree "votes" on the outcome, and the most popular prediction among all the trees is chosen as the final prediction. This ensemble approach often results in more accurate and robust predictions than individual decision trees.

About

This repository contains a set of analyses using supervised and unsupervised machine learning to process biological data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors