Wine Quality Analysis 🍷

Welcome to the Wine Quality Analysis project! In this repository, we dive into the world of unsupervised learning to uncover patterns and clusters in wine quality data. Whether you're a data enthusiast, a wine lover, or a machine learning practitioner, there's something here for you! 🍇🍷

📌 Project Overview

This project applies unsupervised learning techniques to analyze and cluster wines based on their chemical and physical properties. Using the popular Red Wine Quality Dataset, we aim to:

Identify hidden patterns in the data.
Group wines with similar characteristics.
Gain insights into the features that influence wine quality.

🛠️ Methodology

Our approach is divided into four key steps, each documented in a dedicated Jupyter notebook:

Data Cleaning & Preprocessing:
- Handle missing values and outliers.
- Scale and normalize features for optimal clustering performance.
Exploratory Data Analysis (EDA):
- Visualize distributions and correlations.
- Understand feature importance using ANOVA and t-tests.
Unsupervised Learning:
- Apply clustering algorithms like K-Means and DBSCAN.
- Optimize the number of clusters using Elbow Method and Silhouette Score.
Dimensionality Reduction:
- Use PCA to reduce feature dimensions and visualize clusters in 2D.

💻 Repository Structure

wine-quality-analysis/
├── data/                   # Dataset files
├── notebooks/              # Jupyter notebooks for each stage
│   ├── 1_data_cleaning.ipynb
│   ├── 2_exploratory_analysis.ipynb
│   ├── 3_statistical_analysis.ipynb
│   ├── 4_supervised_learning.ipynb
│   ├── 5_unsupervised_learning.ipynb
├── README.md               # Project overview
├── requirements.txt        # Python dependencies

🔬 Results

Clustering Performance:

K-Means: Optimal number of clusters = 2, Silhouette Score = 0.20
DBSCAN: Best parameters (eps=1.7, min_samples=8), Silhouette Score = 0.35

PCA Visualization:

Clusters are well-separated in 2D using the first two principal components, providing clear insights into the structure of the data.

🚀 Future Steps

Explore supervised learning approaches for predicting wine quality.
Develop an interactive Streamlit dashboard for data exploration.
Investigate other clustering algorithms like Agglomerative Clustering.

📚 References

This project was built using the following references and resources:

Wine Quality Dataset - UCI Machine Learning Repository.
"Wine Quality: Machine Learning in Practice" by Cortez et al., 2009.

📬 Contact

If you have any questions, suggestions, or feedback about this project, feel free to reach out!

Author: Miguel Ángel Chamizo Sánchez
Email: miguelchamizo10@gmail.com
LinkedIn: https://www.linkedin.com/in/miguelangelchamizosanchez

Feel free to open an issue in this repository for any bugs or feature requests. Contributions are always welcome!

📦 Requirements

To set up the project environment, install the required dependencies:

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wine Quality Analysis 🍷

📌 Project Overview

🛠️ Methodology

💻 Repository Structure

🔬 Results

Clustering Performance:

PCA Visualization:

🚀 Future Steps

📚 References

📬 Contact

📦 Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Wine Quality Analysis 🍷

📌 Project Overview

🛠️ Methodology

💻 Repository Structure

🔬 Results

Clustering Performance:

PCA Visualization:

🚀 Future Steps

📚 References

📬 Contact

📦 Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages