This project covers two distinct, yet interconnected, areas within the expansive field of data mining: graph theory and neural networks. It meticulously demonstrates the generation and insightful visualization of random graphs utilizing the powerful NetworkX library, offering a foundational understanding of network structures. Furthermore, it delves into the core principles of deep learning by showcasing the comprehensive training and rigorous evaluation of a Convolutional Neural Network (CNN) built with PyTorch, specifically tailored for the challenging task of image classification on the widely recognized CIFAR-10 dataset. This dual approach provides a holistic view of both symbolic and connectionist methods in data analysis.
- Project Overview
- Graph Analysis
- Neural Network for Image Classification
- Dataset
- Installation
- Usage
- Results
- Conclusion
- Contributing
- License
This Jupyter Notebook (datamining_assignment3 (1).ipynb) serves as a comprehensive and practical demonstration of fundamental concepts and techniques in data mining, touching upon both structural analysis and pattern recognition:
This component explores the systematic creation of random graphs using the NetworkX library. It not only generates these complex network structures but also provides methods for their clear and intuitive visualization. Understanding graph structures is crucial in many real-world applications, from social network analysis and epidemiology to transportation logistics and biological pathways.
This segment focuses on the practical implementation, diligent training, and thorough evaluation of a Convolutional Neural Network (CNN) using the PyTorch framework. The objective is to classify images from the CIFAR-10 dataset. This section provides a hands-on experience with building and assessing a deep learning model for a common computer vision problem.
The project utilizes popular Python libraries including:
pandasandnumpyfor data manipulation,NetworkXfor graph theory applications,torchandtorchvisionfor deep learning models.
This section explores generating and visualizing a random directed graph.
- Graph Type: A directed random graph is generated, where edge direction matters (A → B ≠ B → A).
- Parameters:
- Nodes:
n = 30 - Edge probability:
p = 0.2 - Random seed:
seed = 5for reproducibility.
- Nodes:
- Visualization: The graph is visualized to highlight network density, isolated nodes, hubs, and connectivity.
This section covers building, training, and evaluating a CNN using PyTorch.
A straightforward CNN is implemented using torch.nn, including:
- Convolutional layers
- ReLU activation
- Pooling layers
- Fully connected layers
- Uses multiple epochs
- Loss function:
CrossEntropyLoss - Optimizer:
SGDorAdam - Training monitored by observing loss reduction
- Model performance evaluated on unseen test data
- Metric: Accuracy
CIFAR-10: A dataset with 60,000 color images (32x32 px), across 10 classes:
- 50,000 training images
- 10,000 test images
Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
Install required libraries using pip:
pip install NetworkX matplotlib torch torchvision