Data analysis of NBA Draft Combine statistics using machine learning techniques.
Project developed for the Data Exploration (IP2) course at the Master's program, Faculty of Mathematics, University of Belgrade.
The analysis includes:
- Exploratory Data Analysis (EDA)
- Anomaly Detection (Isolation Forest, LOF)
- Dimensionality Reduction (PCA, t-SNE)
- Clustering (K-Means, Hierarchical, DBSCAN)
- Classification (Random Forest, SVM, Neural Networks)
- Association Rules
- 2 player archetypes - identified through clustering
- 10% anomalies - 121 players with unusual characteristics
- 98.77% accuracy - Random Forest classifier
- 90% variance - preserved in 10 PCA components
Full project documentation in Serbian is available in nba_projekat_final.pdf and includes:
- Data description and methodology
- Detailed analysis of all techniques
- Visualizations and interpretation
- Conclusions and recommendations
- Python 3.9+
- Jupyter Notebook
# Install dependencies
pip install pandas numpy scikit-learn matplotlib seaborn scipy mlxtend jupyter
# Run the notebook
jupyter notebook nba_data_analysis.ipynb- Python
- Jupyter Notebook
- scikit-learn
- pandas, numpy
- matplotlib, seaborn
- LaTeX
- Source: NBA Database (Kaggle)
- Size: 1,633 players x 61 features
- Anđela Jovanović - @andjixi