A comprehensive, step-by-step deep learning project for image classification using PyTorch. Learn to build, train, and deploy CNN models from scratch, and apply transfer learning for state-of-the-art results.
- π 8 Progressive Learning Steps - From data loading to deploying on your own images
- π§ Custom CNN Architecture - Build a convolutional neural network from scratch
- π Transfer Learning - Use pre-trained ResNet18 for ~90% accuracy
- π Data Augmentation - Boost performance with image transformations
- β‘ GPU Support - Automatic CUDA/MPS detection for fast training
- πΌοΈ Classify Your Own Images - Use the trained model on any image
| Step | File | What You'll Learn | Difficulty |
|---|---|---|---|
| 1 | steps/step1_data_loading.py |
Datasets, transforms, DataLoaders | β |
| 2 | steps/step2_build_model.py |
CNN architecture (Conv, Pool, FC layers) | ββ |
| 3 | steps/step3_train_model.py |
Training loop, loss functions, optimizers | ββ |
| 4 | steps/step4_evaluate_and_predict.py |
Evaluation, predictions, confusion matrix | ββ |
| 5 | steps/step5_data_augmentation.py |
Image augmentation techniques | ββ |
| 6 | steps/step6_transfer_learning.py |
Pre-trained models, fine-tuning | βββ |
| 7 | steps/step7_learning_rate_scheduler.py |
Learning rate scheduling strategies | βββ |
| 8 | steps/step8_your_own_images.py |
Classify your own images! | β |
git clone https://github.com/Anishyou/Imageclassifier.git
cd Imageclassifierpip install -r requirements.txtcd steps
# Learn the fundamentals
python step1_data_loading.py # Understand data loading
python step2_build_model.py # Explore CNN architecture
# Train and evaluate
python step3_train_model.py # Train the model (~10 min CPU, ~2 min GPU)
python step4_evaluate_and_predict.py # See results
# Advanced techniques
python step5_data_augmentation.py # Data augmentation
python step6_transfer_learning.py # Transfer learning with ResNet
python step7_learning_rate_scheduler.py # LR scheduling
# Use on your own images
python step8_your_own_images.py # Classify any image!| Property | Value |
|---|---|
| Total Images | 60,000 (50k train, 10k test) |
| Image Size | 32Γ32 RGB |
| Classes | 10 |
Classes:
Input (3Γ32Γ32)
β
Conv1 (32 filters) β BatchNorm β ReLU β MaxPool β (32Γ16Γ16)
β
Conv2 (64 filters) β BatchNorm β ReLU β MaxPool β (64Γ8Γ8)
β
Conv3 (128 filters) β BatchNorm β ReLU β MaxPool β (128Γ4Γ4)
β
Flatten (2048)
β
FC1 (256) β ReLU β Dropout(0.5)
β
FC2 (10) β Output (class scores)
Parameters: ~596K trainable parameters
| Model | Accuracy | Training Time |
|---|---|---|
| Custom CNN (10 epochs) | ~70-75% | ~10 min (GPU) |
| With Data Augmentation | ~75-80% | ~15 min (GPU) |
| Transfer Learning (ResNet18) | ~85-92% | ~20 min (GPU) |
Pre-trained model weights are included in the models/ folder:
| File | Description | How to Use |
|---|---|---|
models/best_model.pth |
Custom CNN trained on CIFAR-10 | Load with step2_build_model.ImageClassifier |
models/feature_extractor_best.pth |
ResNet18 transfer learning | Load with torchvision.models.resnet18 |
import torch
import sys
sys.path.append('steps')
from step2_build_model import ImageClassifier
# Load custom CNN
model = ImageClassifier(num_classes=10)
checkpoint = torch.load('models/best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Check accuracy achieved
print(f"Best accuracy: {checkpoint['best_acc']:.2f}%")cd steps
python step8_your_own_images.py --image path/to/your/image.jpgOr use in code (from project root):
from PIL import Image
import torch
import torchvision.transforms as transforms
import sys
sys.path.append('steps')
from step2_build_model import ImageClassifier
# Classes
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')
# Load model
model = ImageClassifier(num_classes=10)
checkpoint = torch.load('models/best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Prepare image
transform = transforms.Compose([
transforms.Resize((32, 32)),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))
])
image = Image.open('your_image.jpg').convert('RGB')
input_tensor = transform(image).unsqueeze(0)
# Predict
with torch.no_grad():
output = model(input_tensor)
_, predicted = output.max(1)
print(f"Prediction: {classes[predicted.item()]}")Imageclassifier/
βββ π steps/ # All learning step files
β βββ step1_data_loading.py # Data loading tutorial
β βββ step2_build_model.py # CNN architecture
β βββ step3_train_model.py # Training loop
β βββ step4_evaluate_and_predict.py # Evaluation
β βββ step5_data_augmentation.py # Augmentation
β βββ step6_transfer_learning.py # Transfer learning
β βββ step7_learning_rate_scheduler.py # LR scheduling
β βββ step8_your_own_images.py # Use your own images
βββ π models/ # Trained model weights
β βββ best_model.pth # Custom CNN weights
β βββ feature_extractor_best.pth # Transfer learning weights
βββ π outputs/ # Generated images & plots
β βββ (training curves, predictions, etc.)
βββ π data/ # CIFAR-10 dataset (auto-downloaded)
βββ π requirements.txt # Dependencies
βββ π LICENSE # MIT License
βββ π README.md # This file
πΉ Data Transforms
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean, std)
])πΉ Training Loop
for epoch in range(epochs):
for images, labels in train_loader:
optimizer.zero_grad() # Reset gradients
outputs = model(images) # Forward pass
loss = criterion(outputs, labels) # Compute loss
loss.backward() # Backward pass
optimizer.step() # Update weightsπΉ Evaluation Mode
model.eval()
with torch.no_grad():
outputs = model(images)
_, predicted = outputs.max(1)πΉ Transfer Learning
from torchvision import models
# Load pre-trained ResNet18
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
# Replace final layer for 10 classes
model.fc = nn.Linear(512, 10)- Python 3.8+
- PyTorch 2.0+
- torchvision
- matplotlib
- numpy
- tqdm
- Pillow
Contributions are welcome! Feel free to:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- CIFAR-10 Dataset by Alex Krizhevsky
- PyTorch for the amazing deep learning framework
- torchvision for pre-trained models
Made with β€οΈ for learning deep learning
