diff --git a/Nilesh/Task_5/README.md b/Nilesh/Task_5/README.md index 8425dee..23b3897 100644 --- a/Nilesh/Task_5/README.md +++ b/Nilesh/Task_5/README.md @@ -8,8 +8,12 @@ Neural style transfer is a technique that applies the artistic style of one imag ## Link +1. **Custom Model:** https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sharing +2. **Pre-trained Model:** +https://colab.research.google.com/drive/1xZjAsrQj99h8KX61LPmendmphCPvpaqL?usp=sharing + ## Features - Loads and preprocesses content and style images @@ -30,13 +34,9 @@ https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sh 1. **Install Required Libraries** - If you're running this outside of Google Colab, install the necessary packages: - - """ pip install tensorflow numpy matplotlib pillow - """ -2. **Prepare Your Images** +2. **Preparing Images** Choose a content image Choose a style image @@ -46,9 +46,8 @@ https://colab.research.google.com/drive/12zktQ8BdUF-3LDXvIEShWsQ-aIKQhofl?usp=sh 4. **Execute the following code block at the end of the notebook:** - """ result = style_transfer("/content/big_ben.jpeg", "/content/starry_night.jpeg", epochs=1000, lr=0.02) show_image(result, "Final Stylized Image") - """ -5. **Show the output after every 50 epochs** \ No newline at end of file +5. **Show the output after every 50 epochs** + diff --git a/Nilesh/Task_6/6(i)/README.md b/Nilesh/Task_6/6(i)/README.md new file mode 100644 index 0000000..d47a9f6 --- /dev/null +++ b/Nilesh/Task_6/6(i)/README.md @@ -0,0 +1,38 @@ +# K-Means Clustering of Credit Card Customer Data + +This project performs clustering on a credit card customer dataset. The dataset contains information about customer transactions and spending behavior, making it suitable for unsupervised learning. + +## Overview + +Customer clustering can help identify spending patterns, customer types, and behavioral groups without using labeled data. This can be useful for targeted marketing, fraud detection, risk analysis, and personalized financial services. The technique used here is K-Means, along with exploratory data analysis and preprocessing. + +## Dataset link: +https://www.kaggle.com/datasets/arjunbhasin2013/ccdata + +## Colab link: +https://colab.research.google.com/drive/1LRn00h-GlvxAlJWQkRypHQHHr66y4E7_?usp=sharing + +## Steps Performed + +1. Importing Libraries (NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn were used). +2. Loading the dataset from the CSV file and visualized using head, shape, info, and describe. +3. The column CUST_ID was removed as it does not contribute to clustering. +4. Missing values were replaced with each column’s mean. +5. Exploratory Data Analysis using a histogram of all features to observe data distribution and a heatmap showing correlations between variables. +6. Feature Scaling using StandardScaler as K-Means clustering is distance-based. +7. To determine the optimal number of clusters, the Elbow Method was implemented in which the K-Means model was trained for values of k from 1 to 10, and inertia scores were plotted and the “elbow point” signifies the ideal number of clusters where adding more clusters stops significantly reducing inertia (4 in this case). +8. Applying K-Means Clustering, each customer was assigned a cluster label(0,1,2,3). +7. Visualization of Clusters through pairplots + + +## Requirements + +- Python 3.x +- Numpy +- Pandas +- Matplotlib +- Seaborn +- Scikit-Learn + +## Conclusion +This project demonstrates how unsupervised learning techniques like K-Means can segment credit card customers based on spending habits. After preprocessing, visualization, and the use of the Elbow Method, customers are grouped into meaningful clusters, providing valuable insights for credit card services and data analysis. \ No newline at end of file diff --git a/Nilesh/Task_6/6(ii)/README.md b/Nilesh/Task_6/6(ii)/README.md new file mode 100644 index 0000000..d152d1d --- /dev/null +++ b/Nilesh/Task_6/6(ii)/README.md @@ -0,0 +1,37 @@ +# PCA on the Iris Dataset + +This project demonstrates how to perform Principal Component Analysis (PCA) on the Iris dataset using Python, Scikit-Learn, and matplotlib. + +## Overview + +The Iris dataset has four numerical features: sepal length, sepal width, petal length, and petal width. In this project, the dataset is reduced from four dimensions to two using PCA. The two resulting components preserve most of the original variance, and the reduced data is plotted to show the separation between different Iris flower species. + +## Link +https://colab.research.google.com/drive/1eciLwHYCS1u2g21pbPyqSfIfG4bsXKgX?usp=sharing + +## Steps Performed + +1. Imported required libraries (NumPy, Pandas, Matplotlib, and modules from Scikit-Learn). +2. Loaded the Iris dataset. +3. Extracted the input features and target class labels. +4. Standardized the feature values to zero mean and unit variance (because PCA is sensitive to the relative magnitude of each feature) +5. Applied PCA to reduce the dataset from 4 components to 2 principal components. +6. Displayed the explained variance ratio (72.96% for one principal component and 22.85% for the other so overall >95%). +7. Visualized the transformed data using a 2D scatter plot (the 3 different kinds of irises can be clearly seen segregated into 3 distinct groups). + +## Output and Interpretation + +- The output includes the variance explained by each of the principal components (72.96% and 22.85%). +- The sum of these values indicates how much of the original dataset’s information is preserved (95.81%). +- The 2D scatter plot shows clear clustering that corresponds to the three Iris species, demonstrating that even after reducing the dimensionality, the structure of the data remains well separated. + +## Requirements + +- Python 3.x +- NumPy +- matplotlib +- scikit-learn + +## Conclusion + +This project shows how PCA can significantly reduce data dimensionality while preserving important information. \ No newline at end of file diff --git a/Nilesh/Task_6/6(iii)/README.md b/Nilesh/Task_6/6(iii)/README.md new file mode 100644 index 0000000..e6329f9 --- /dev/null +++ b/Nilesh/Task_6/6(iii)/README.md @@ -0,0 +1,52 @@ +# PCA + K-Means Clustering on the Digits Dataset + +This project applies Principal Component Analysis (PCA) and K-Means clustering to the handwritten digits dataset available in scikit-learn. It aims to reduce the dimensionality of the image data while preserving most of the variance, and then group the images into clusters that represent different digits. + +## Overview + +The digits dataset consists of 1,797 grayscale handwritten digit images (0–9), each represented as an 8×8 image. Each image is flattened into a 64-feature vector of pixel intensities. Working with all 64 features can be inefficient and noisy, so PCA is used to reduce dimensionality. + +After PCA reduces the dataset to a lower-dimensional subspace that keeps at least 95% of the variance, K-Means is applied to cluster the digits into 10 groups. The results are then visualized using a confusion matrix and sample images from each cluster. + +## Colab Link +https://colab.research.google.com/drive/1rBpyd32QFudwq4chuSR0n1rdtUe1bamu?usp=sharing + +## Steps Performed + +1. The digits dataset was loaded from scikit-learn. +- X contains the flattened pixel values +- y contains the true digit labels +- images contains the original 8×8 images used for visualization + +2. A 3×3 grid of sample digit images was plotted to understand what the dataset looks like before processing. + +3. Standardized pixel values to ensure each feature has zero mean and unit variance because PCA and K-Means are scale-sensitive. + +4. Finding optimal no of components for PCA +- PCA was fitted on the standardized data without specifying the number of components. +- The cumulative explained variance was plotted to determine how many components were required to preserve most of the information. +- The model found that 40 principal components are enough to retain 95% or more of the variance. + +5. Created a scikit-learn pipeline with: +- StandardScaler +- PCA with 40 components(as determined above) +- K-Means with 10 clusters(as there are 10 digits) + +6. Fitted the pipeline to the original dataset, assigning each sample a cluster label. + +7. Used a confusion matrix to compare actual digits with cluster assignments. + +8. Visualized Cluster Results by displaying sample images from each of the 10 clusterss. + +## Requirements + +- Python 3.x +- Numpy +- Matplotlib +- Seaborn +- Scikit-learn + +## Conclusion + +This project demonstrates how dimensionality reduction and clustering can be applied to image data. +By combining PCA and K-Means into an automated pipeline, handwritten digits can be grouped into visually meaningful clusters while keeping most of the dataset’s information intact. \ No newline at end of file