This repository contains the full implementation of three course projects completed as part of the subject Computer Data Analysis. The projects are based on the well-known Iris dataset and focus on the practical application of data exploration, visualization, and machine learning techniques using Python.
- Loaded and preprocessed the dataset.
- Calculated key descriptive statistics (mean, median, min, max, quartiles, standard deviation).
- Visualized data distributions using histograms and boxplots.
- Investigated relationships between features using Pearson correlation and linear regression.
- Normalized the dataset using min-max scaling.
- Used the elbow method to determine the optimal number of clusters.
- Applied the k-means algorithm and visualized the clusters across different feature pairings.
- Built a k-NN classifier using custom implementation.
- Evaluated classification accuracy for
kvalues from 1 to 15. - Generated confusion matrices and plotted accuracy metrics.
- Repeated the classification for various pairs of features.
- Language: Python
- Libraries:
pandas,numpy,matplotlib,seaborn - Custom Modules:
lib_ksrednich.py– for clusteringlib_knn.py– for classification
This repository demonstrates a full data analysis workflow – from data preprocessing and visualization, through unsupervised clustering, to supervised classification and performance evaluation. It reflects the practical skills and analytical thinking developed throughout the Computer Data Analysis course.