Keywords: random forest, decision tree, prediction
This project provides a Java implementation of random forests [1, 2]. Random forests use training sets to build decision trees. Given an input (e.g. a person with age, gender, medical background, symptoms) the result (e.g. a disease) of which is unknown, random forests are able to predict the corresponding result.
The parameters that will be used to build random forests. The default values are :
int minSamplesSplit = 2;
int maxDepth = Integer.MAX_VALUE;
double minImpurityDecrease = 1e-07;
int minSampleLeaf = 1;
int maxFeatures = Integer.MAX_VALUE;
int nbTrees = 10;
Long seed = null;Return a builder to setup the parameters of the random forest. The available functions to update the default values are :
// Builder example
Parameter p = new Parameter.Builder()
.nbTrees(200)
.maxFeatures(3)
.build();Constructor of the random forest.
Train the random forest using a list of tuples D.
This function only takes into account the getters of D that are annotated
with a Feature which is either ORDERED or CATEGORICAL.
The getter of the target (or result) must be annotated by Target with a
type which is either CONTINUOUS or DISCRETE.
// Annotation example
@Feature(FeatureType.ORDERED)
public Integer getAge() {
return age;
}
@Target(TargetType.DISCRETE)
public Integer getSurvived() {
return survived;
}Predict the result R according to the data D.
Get the list of features sorted by decreasing importance.
A usage example about Titanic survivors is available at broceliande-example.
[1] Leo Breiman. Random Forests. Machine Learning. vol. 45, p. 5-32. 2001.
[2] Gilles Louppe. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502, 2014.