Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions Nilesh/Task_5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ Neural style transfer is a technique that applies the artistic style of one imag

## Link

1. **Custom Model:**
https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sharing

2. **Pre-trained Model:**
https://colab.research.google.com/drive/1xZjAsrQj99h8KX61LPmendmphCPvpaqL?usp=sharing

## Features

- Loads and preprocesses content and style images
Expand All @@ -30,13 +34,9 @@ https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sh

1. **Install Required Libraries**

If you're running this outside of Google Colab, install the necessary packages:

"""
pip install tensorflow numpy matplotlib pillow
"""

2. **Prepare Your Images**
2. **Preparing Images**

Choose a content image
Choose a style image
Expand All @@ -46,9 +46,8 @@ https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sh

4. **Execute the following code block at the end of the notebook:**

"""
result = style_transfer("/content/big_ben.jpeg", "/content/starry_night.jpeg", epochs=1000, lr=0.02)
show_image(result, "Final Stylized Image")
"""

5. **Show the output after every 50 epochs**
5. **Show the output after every 50 epochs**

38 changes: 38 additions & 0 deletions Nilesh/Task_6/6(i)/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# K-Means Clustering of Credit Card Customer Data

This project performs clustering on a credit card customer dataset. The dataset contains information about customer transactions and spending behavior, making it suitable for unsupervised learning.

## Overview

Customer clustering can help identify spending patterns, customer types, and behavioral groups without using labeled data. This can be useful for targeted marketing, fraud detection, risk analysis, and personalized financial services. The technique used here is K-Means, along with exploratory data analysis and preprocessing.

## Dataset link:
https://www.kaggle.com/datasets/arjunbhasin2013/ccdata

## Colab link:
https://colab.research.google.com/drive/1LRn00h-GlvxAlJWQkRypHQHHr66y4E7_?usp=sharing

## Steps Performed

1. Importing Libraries (NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn were used).
2. Loading the dataset from the CSV file and visualized using head, shape, info, and describe.
3. The column CUST_ID was removed as it does not contribute to clustering.
4. Missing values were replaced with each column’s mean.
5. Exploratory Data Analysis using a histogram of all features to observe data distribution and a heatmap showing correlations between variables.
6. Feature Scaling using StandardScaler as K-Means clustering is distance-based.
7. To determine the optimal number of clusters, the Elbow Method was implemented in which the K-Means model was trained for values of k from 1 to 10, and inertia scores were plotted and the “elbow point” signifies the ideal number of clusters where adding more clusters stops significantly reducing inertia (4 in this case).
8. Applying K-Means Clustering, each customer was assigned a cluster label(0,1,2,3).
7. Visualization of Clusters through pairplots


## Requirements

- Python 3.x
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn

## Conclusion
This project demonstrates how unsupervised learning techniques like K-Means can segment credit card customers based on spending habits. After preprocessing, visualization, and the use of the Elbow Method, customers are grouped into meaningful clusters, providing valuable insights for credit card services and data analysis.
37 changes: 37 additions & 0 deletions Nilesh/Task_6/6(ii)/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# PCA on the Iris Dataset

This project demonstrates how to perform Principal Component Analysis (PCA) on the Iris dataset using Python, Scikit-Learn, and matplotlib.

## Overview

The Iris dataset has four numerical features: sepal length, sepal width, petal length, and petal width. In this project, the dataset is reduced from four dimensions to two using PCA. The two resulting components preserve most of the original variance, and the reduced data is plotted to show the separation between different Iris flower species.

## Link
https://colab.research.google.com/drive/1eciLwHYCS1u2g21pbPyqSfIfG4bsXKgX?usp=sharing

## Steps Performed

1. Imported required libraries (NumPy, Pandas, Matplotlib, and modules from Scikit-Learn).
2. Loaded the Iris dataset.
3. Extracted the input features and target class labels.
4. Standardized the feature values to zero mean and unit variance (because PCA is sensitive to the relative magnitude of each feature)
5. Applied PCA to reduce the dataset from 4 components to 2 principal components.
6. Displayed the explained variance ratio (72.96% for one principal component and 22.85% for the other so overall >95%).
7. Visualized the transformed data using a 2D scatter plot (the 3 different kinds of irises can be clearly seen segregated into 3 distinct groups).

## Output and Interpretation

- The output includes the variance explained by each of the principal components (72.96% and 22.85%).
- The sum of these values indicates how much of the original dataset’s information is preserved (95.81%).
- The 2D scatter plot shows clear clustering that corresponds to the three Iris species, demonstrating that even after reducing the dimensionality, the structure of the data remains well separated.

## Requirements

- Python 3.x
- NumPy
- matplotlib
- scikit-learn

## Conclusion

This project shows how PCA can significantly reduce data dimensionality while preserving important information.
52 changes: 52 additions & 0 deletions Nilesh/Task_6/6(iii)/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# PCA + K-Means Clustering on the Digits Dataset

This project applies Principal Component Analysis (PCA) and K-Means clustering to the handwritten digits dataset available in scikit-learn. It aims to reduce the dimensionality of the image data while preserving most of the variance, and then group the images into clusters that represent different digits.

## Overview

The digits dataset consists of 1,797 grayscale handwritten digit images (0–9), each represented as an 8×8 image. Each image is flattened into a 64-feature vector of pixel intensities. Working with all 64 features can be inefficient and noisy, so PCA is used to reduce dimensionality.

After PCA reduces the dataset to a lower-dimensional subspace that keeps at least 95% of the variance, K-Means is applied to cluster the digits into 10 groups. The results are then visualized using a confusion matrix and sample images from each cluster.

## Colab Link
https://colab.research.google.com/drive/1rBpyd32QFudwq4chuSR0n1rdtUe1bamu?usp=sharing

## Steps Performed

1. The digits dataset was loaded from scikit-learn.
- X contains the flattened pixel values
- y contains the true digit labels
- images contains the original 8×8 images used for visualization

2. A 3×3 grid of sample digit images was plotted to understand what the dataset looks like before processing.

3. Standardized pixel values to ensure each feature has zero mean and unit variance because PCA and K-Means are scale-sensitive.

4. Finding optimal no of components for PCA
- PCA was fitted on the standardized data without specifying the number of components.
- The cumulative explained variance was plotted to determine how many components were required to preserve most of the information.
- The model found that 40 principal components are enough to retain 95% or more of the variance.

5. Created a scikit-learn pipeline with:
- StandardScaler
- PCA with 40 components(as determined above)
- K-Means with 10 clusters(as there are 10 digits)

6. Fitted the pipeline to the original dataset, assigning each sample a cluster label.

7. Used a confusion matrix to compare actual digits with cluster assignments.

8. Visualized Cluster Results by displaying sample images from each of the 10 clusterss.

## Requirements

- Python 3.x
- Numpy
- Matplotlib
- Seaborn
- Scikit-learn

## Conclusion

This project demonstrates how dimensionality reduction and clustering can be applied to image data.
By combining PCA and K-Means into an automated pipeline, handwritten digits can be grouped into visually meaningful clusters while keeping most of the dataset’s information intact.