This project focuses on clustering techniques (K-Means and Hierarchical Clustering) applied to stock market data. By grouping stocks exhibiting similar characteristics, one can make more informed decisions about allocating investments and diversifying portfolios.
- USL_Project_LearnerNotebook_FullCode-1.ipynb: Jupyter Notebook with detailed exploratory data analysis, data preprocessing, and clustering implementation steps.
- stock_data.csv: Dataset containing stock data (current price, volatility, financial ratios, etc.) from several NYSE-listed companies.
- README.md: Documentation on project purpose, requirements, and usage guidelines.
-
Install dependencies:
- Python 3.x
- pandas, numpy, matplotlib, seaborn, scikit-learn, scipy
- yellowbrick
-
Set up environment (example with pip):
pip install pandas numpy matplotlib seaborn scikit-learn scipy yellowbrick
-
Run the Notebook:
- Open the Jupyter Notebook (USL_Project_LearnerNotebook_FullCode-1.ipynb).
- Restart the kernel and run all cells sequentially.
- Data Loading: Reads and cleans the stock dataset.
- Exploratory Analysis: Uses summary statistics and plots to explore data distribution.
- Preprocessing: Standardizes and prepares data for clustering.
- K-Means Clustering: Determines the optimal number of clusters using the elbow method and silhouette analysis.
- Hierarchical Clustering: Explores dendrograms to select the best linkage and clusters.
- Stocks are grouped into clusters based on similarities in financial metrics, which aids in creating a balanced and diversified portfolio.
- K-Means provides direct cluster assignments, while Hierarchical Clustering offers a visual representation (dendrogram) of how stocks group together across different levels of similarity.
Contributions in the form of issue reports or suggestions are welcome.