Third work of UB course "Introduction to Machine Learning" implementing clustering algorithms
Eva Veli, Andras Kasa and Niklas Long Schiefelbein
- PyCharm IDE (Professional or Community Edition)
- Python 3.9 installed on your system
-
Open the project
work3in PyCharm -
Open the terminal in PyCharm (View > Tool Windows > Terminal)
-
Optional: Verify current location being
work3bypwd -
Optional: Navigate to
work3withcd -
Create a virtual environment:
# Windows py -3.9 -m venv venv # macOS/Linux python3.9 -m venv venv
-
Activate the virtual environment:
# Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
In front of the input line in the terminal it should now say (venv)
With the virtual environment activated:
pip install -r requirements.txtFrom here you can directly jump to Run app.py
With the virtual environment activated:
deactivateThe (venv) in front of the terminal should be gone
For this, just follow the optional steps 3 and 4 from the Manual Virtual Environment Setup
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activateIn front of the input line in the terminal it should now say (venv)
python app.pyThe first execution takes more time than usual due to the initial compilation of the whole project. Once compiled, it prompts the user to provide an input. The user must decide whether to use the cmc, hepatitis or the pen-based dataset for the analysis. By simply pressing enter, the cmc dataset will be selected by default.
Now the preprocessing pipeline will execute. After that the user can choose from 9 menu items including executing the clustering algorithm, report generation and exiting the program. Progress for each functionality is displayed in the console, but due to the fast computation it may be difficult to follow at all times. It is recommended to refer to the final reports for evaluation.
For deeper insights please consider reading the report of the project.
work3/
├── cluster_algorithms/ # K-Family, OPTICS and Spectral
├── datasets/ # Dataset files
├── metrics/ # Distance and evaluation metrics
├── plots/ # All plots
├── plotting/ # Code used to create plots
├── preprocessing/ # Preprocessing functions
├── results/ # CSV results for each algorithm
├── summary/ # LaTeX tables and their code
├── .gitignore # Gitignore file
├── app.py # Main application script
├── README.md # This file
├── requirements.txt # Dependencies
└── utils.py # Utility functions