This repository contains the code for the coursework for COMP70050 Introduction to Machine Learning. The coursework is to implement a decision tree algorithm and to test it on two datasets. The first dataset is a clean dataset and the second dataset is a noisy dataset. The code is written in Python 3.7.3.
The code requires the following dependencies:
- Python 3.7.3
- numpy 1.16.4
- pandas 0.24.2
- matplotlib 3.1.0
- scikit-learn 0.21.2
- Mahanoor Syed
- Brendon Ferra
- Harry Phillips
- Ameen Izhac
To run the code on the clean and noisy datasets please run:
python3 main.py
This will create trees that are unpruned, pruned and pruned with a depth limit for both datasets.
Please move your dataset into the wifi_db folder.
To generate an unpruned tree please run:
python3 classify_dataset.py <custom_dataset.txt> <k>
To generate a pruned tree please run:
python3 prune_dataset.py <custom_dataset.txt> <k>
To generate a pruned tree with a depth limit please run:
python3 prune_dataset_limit.py <custom_dataset.txt> <k> <limit>
The report for this coursework can be found here.
We scored 100% for this coursework.