Skip to content

Advanced Computer Vision & Deep Learning: Complete implementations of blob detection (LoG/DoG/DoH), image processing pipelines, edge detection, histogram analysis, DCT compression, semantic segmentation, YOLO/Faster R-CNN object detection, and CNN classifiers for MNIST/CIFAR/Fashion-MNIST using PyTorch, TensorFlow, and OpenCV

Notifications You must be signed in to change notification settings

anshika1279/Computer-Vision-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Computer Vision and Video Analytics

A comprehensive implementation of advanced computer vision techniques and video analytics solutions using Python, OpenCV, and deep learning frameworks.

Table of Contents

Overview

This repository contains implementations of advanced computer vision algorithms and video analytics techniques including:

  • Image transformations and geometric operations
  • Object detection and tracking
  • Face recognition and analysis
  • Pose estimation
  • Video processing and frame analysis
  • Motion detection and activity recognition
  • Semantic and instance segmentation
  • 3D reconstruction

Features

Image Processing

  • Geometric transformations (rotation, scaling, translation, shearing, reflection)
  • Image filtering and enhancement
  • Morphological operations
  • Edge detection and contour analysis

Video Analytics

  • Real-time video processing
  • Frame-level analysis
  • Motion tracking
  • Temporal analysis

Deep Learning Integration

  • Pre-trained model support (YOLO, ResNet, etc.)
  • Custom model implementations
  • Transfer learning examples

Visualization Tools

  • Matplotlib-based visualization
  • Real-time plotting
  • Annotated frame display

Installation

Prerequisites

  • Python 3.8+
  • pip or conda

Setup

git clone https://github.com/anshika1279/Computer-Vision-Implementation.git
cd Computer-Vision-Implementation
pip install -r requirements.txt

Key runtime dependencies include:

  • Notebook tooling: nbformat, nbconvert, notebook (for cleaning/fixing metadata)
  • CV/DL extras: timm (MobileNetV1), ultralytics, torch/torchvision, tensorflow

Project Structure

  • README.md: Project overview and instructions.
  • requirements.txt: Python dependencies for all notebooks.
  • image_processing_and_digits_classification.ipynb: Image resizing/blur demo plus digits classification with multiple models.
  • ShapeAndImageTransformations.ipynb: Shape and image transformation examples.
  • edge_detection_and_image_segmentation.ipynb: Edge detection operators and image segmentation techniques.
  • Histogram_Analysis_Equalization_DFT.ipynb: Histogram analysis, contrast enhancement, and frequency domain transformations.
  • image_compression_techniques_DCT_Deep_learning_image_classification.ipynb: DCT compression and CNN-based digit/object classification.
  • segmentation_detection_classification.ipynb: Advanced CV pipeline with edge/region segmentation, Hough transform, YOLO/R-CNN detection, and Fashion-MNIST/CIFAR-100 classification.
  • blob_detection_image_enhancement_classification.ipynb: Blob detection algorithms (LoG, DoG, DoH), comprehensive image enhancement techniques, and transfer learning with AlexNet/VGG16 on CIFAR-100.
  • sift_orb_watershed_resnet_few_shot_learning.ipynb: Feature detection (SIFT, ORB), feature matching (BFMatcher), watershed segmentation, ResNet-18/34 classification on CIFAR-100, and few-shot/one-shot learning with elastic deformation augmentation.
  • Stitching_Denoising_GAN_SegmentationPlayground.ipynb: Colab-ready playground covering image stitching (simple/panorama/ORB + pose), inpainting, MNIST denoising autoencoders, GANs (MNIST, CIFAR-10), MobileNet V1/V2/V3 fine-tuning, and notebook metadata fixes for GitHub rendering.
  • Image_Denoising_Video_Action_Pipeline.ipynb: Image denoising comparison (Median, Wavelet, Noise2Void U-Net) and video action recognition pipeline (frame extraction, video processing, UCF101 subset with 3D CNN classification).

Modules & Implementations

  • Image resizing with multiple interpolation methods and blurring with box, Gaussian, and bilateral filters.
  • Digits classification using sklearn digits dataset with Gaussian Naive Bayes, RBF SVM, and Random Forest, including cross-validation and ROC visualization.

Notebook: Image Processing & Digits Classification

  • File: image_processing_and_digits_classification.ipynb
  • Part 1: Resize (linear, nearest, cubic) and blur (box, Gaussian, bilateral) a local image; expects image.png in the repo root and displays a comparison grid.
  • Part 2: Train/evaluate classifiers (Gaussian Naive Bayes, RBF SVM, Random Forest) on sklearn digits with 5-fold CV; prints metrics and shows ROC curves.

Notebook: Shape and Image Transformations

  • File: ShapeAndImageTransformations.ipynb
  • Part 1: 2D rectangle transformations (translate, scale, rotate, reflect, shear, composite) visualized with Matplotlib.
  • Part 2: Image transformations on input.jpg using OpenCV (translate, reflect, rotate, scale, crop, shear on x/y) with side-by-side plots.
  • Part 3: Additional 2D shape transformations with reusable helpers for translate/scale/rotate/reflect/shear and composite examples.

Notebook: Edge Detection and Image Segmentation

  • File: edge_detection_and_image_segmentation.ipynb
  • Edge Detection: Implements multiple edge detection operators including:
    • Sobel operator (combined X and Y gradients)
    • Prewitt edge detection
    • Roberts cross operator
    • Canny edge detector
  • Image Segmentation: Demonstrates various segmentation techniques:
    • Global thresholding (fixed threshold binarization)
    • Adaptive thresholding (local neighborhood-based)
    • Watershed segmentation with morphological operations
  • Preprocessing: Includes color space conversions (BGR→RGB→Grayscale→Binary) and image metrics calculation
  • Visualization: Displays all results in a comprehensive grid layout with labeled subplots
  • Outputs: Saves processed images (edge maps, segmented regions) for further analysis

Notebook: Histogram Analysis, Equalization & DFT

  • File: Histogram_Analysis_Equalization_DFT.ipynb
  • Histogram Analysis: Computes and plots histograms for both grayscale and color (RGB) images
    • Individual channel histograms for color images (B, G, R)
    • Histogram normalization to probability distributions
    • Visualization with matplotlib for histogram analysis
  • Contrast Enhancement: Implements histogram equalization for improving image contrast
    • Before/after comparison of grayscale images
    • Visual quality assessment with text annotations
    • Side-by-side display of original and equalized images
  • Discrete Fourier Transform (DFT): Frequency domain analysis and transformations
    • DFT computation with magnitude spectrum visualization
    • Inverse DFT for image reconstruction
    • Rotation property verification (45° rotation test)
    • Demonstrates spatial vs. frequency domain correspondence
  • Compatible with Google Colab: Uses cv2_imshow for Colab environments

Notebook: Image Compression & Deep Learning Classification

  • File: image_compression_techniques_DCT_Deep_learning_image_classification .ipynb
  • DCT-Based Image Compression: Implements both lossy and lossless compression techniques
    • Lossy compression with quantization using JPEG standard quantization matrix
    • Lossless compression preserving all DCT coefficients
    • Block-wise DCT/IDCT operations (8×8 blocks)
    • Compression ratio analysis and file size comparison
  • MNIST Digit Classification: CNN implementations for handwritten digit recognition
    • Basic 3-layer CNN architecture
    • Enhanced CNN with BatchNormalization, Dropout, and L2 regularization
    • Data augmentation (rotation, shifts, zoom)
    • Learning rate scheduling and early stopping
  • CIFAR-10 Classification: Color image classification with CNN
    • 10-class object recognition on 32×32 color images
    • Similar architecture adapted for RGB inputs
  • Model Evaluation: Comprehensive performance metrics
    • Classification reports with precision, recall, F1-score
    • Confusion matrices with heatmap visualization
    • ROC curves and AUC scores

Notebook: Segmentation, Detection & Classification

  • File: segmentation_detection_classification.ipynb
  • Image Segmentation: Multiple segmentation approaches
    • Edge-based segmentation using Canny edge detection
    • Region-based segmentation with thresholding techniques
    • Visualization with matplotlib for result comparison
  • Hough Transform: Line detection and feature extraction
    • Probabilistic Hough Line Transform for straight line detection
    • Configurable parameters for line detection sensitivity
    • Visual overlay of detected lines on original images
  • Object Detection: State-of-the-art detection models
    • YOLOv8: Real-time object detection with ultralytics framework
    • Faster R-CNN: Region-based detection with ResNet50-FPN backbone
    • Bounding box visualization with confidence scores
    • Pre-trained models on COCO dataset for 80+ object classes
  • Deep Learning Classification: Multi-dataset CNN training
    • Fashion-MNIST: Clothing classification (10 classes, 28×28 grayscale)
    • CIFAR-100: Fine-grained object classification (100 classes, 32×32 RGB)
    • Custom CNN architectures with Conv2D, MaxPooling, and Dense layers
    • 5-epoch training with validation accuracy tracking
    • Classification reports with precision, recall, and F1-scores
  • Integrated Pipeline: End-to-end processing workflow combining segmentation, detection, and classification
  • Dual Environment Support: Compatible with both local (matplotlib) and Google Colab (cv2_imshow) environments

Notebook: Blob Detection, Image Enhancement & Classification

  • File: blob_detection_image_enhancement_classification.ipynb
  • Blob Detection Algorithms: Implementation of three advanced blob detection methods
    • LoG (Laplacian of Gaussian): Scale-space blob detection with adjustable sigma parameters
    • DoG (Difference of Gaussian): Efficient approximation of LoG for faster computation
    • DoH (Determinant of Hessian): Hessian matrix-based blob detection for feature localization
    • Purple region extraction using HSV color masking
    • Morphological preprocessing pipeline (erosion, dilation, opening, closing, area operations)
    • Handles RGBA images with alpha channel conversion
    • Red circle overlay visualization of detected blobs
  • Image Enhancement Pipeline: Eight comprehensive image processing techniques
    • Brightness & Contrast adjustment with alpha/beta parameters
    • Image sharpening using custom convolution kernels
    • Denoising with Non-Local Means algorithm
    • Color enhancement using PIL ImageEnhance
    • Image resizing with interpolation
    • Inverse transform (bitwise NOT operation)
    • Histogram equalization (grayscale and color via YCrCb)
    • LAB color space-based color correction
    • Grid visualization (3×3 layout) of all enhancement results
  • Transfer Learning Classification: CIFAR-100 fine-grained classification
    • AlexNet: Pre-trained on ImageNet, fine-tuned for 100 classes
    • VGG16: Deep architecture with 16 layers, adapted for CIFAR-100
    • Modified final classifier layers for 100-class output
    • SGD optimizer with momentum (lr=0.0001, momentum=0.9)
    • Cross-entropy loss function
    • Training loop with tqdm progress bars
    • Batch size optimized for memory efficiency (batch_size=16)
    • Automatic device selection (CUDA/MPS/CPU)
    • ImageNet normalization for transfer learning compatibility
  • Model Evaluation: Comprehensive accuracy metrics on CIFAR-100 test set
  • Multi-Image Processing: Batch processing across multiple test images (p1.jpg, p2.jpg, p3.png, p4.png, p5.jpg)

Notebook: SIFT, ORB, Watershed, ResNet & Few-Shot Learning

  • File: sift_orb_watershed_resnet_few_shot_learning.ipynb
  • Feature Detection & Description: Classical computer vision feature extraction
    • SIFT (Scale-Invariant Feature Transform): Keypoint detection and descriptor computation
    • ORB (Oriented FAST and Rotated BRIEF): Fast binary descriptor for real-time applications
    • Rich keypoint visualization with OpenCV drawing flags
    • Multiple test images for robustness evaluation
  • Feature Matching: Correspondence finding between image pairs
    • BFMatcher (Brute-Force Matcher): Exhaustive descriptor matching
    • Hamming distance for ORB binary descriptors
    • L2 distance for SIFT float descriptors
    • Cross-check validation for bidirectional matching
    • Top-50 matches visualization with distance-based sorting
  • Watershed Segmentation: Marker-based region segmentation
    • Binary thresholding (THRESH_BINARY_INV) for preprocessing
    • Custom marker seeds for background/foreground separation
    • Watershed algorithm with boundary highlighting (red contours)
    • Contour detection with RETR_EXTERNAL mode
    • Multi-stage visualization (original → watershed → contours)
  • Deep Learning Classification: CIFAR-100 fine-grained recognition
    • ResNet-18: Residual network with 18 layers (ImageNet pre-trained)
    • ResNet-34: Deeper 34-layer residual architecture
    • Transfer learning with modified final FC layer (100 classes)
    • SGD optimizer with momentum (lr=0.0001, momentum=0.9)
    • Cross-entropy loss with 5-epoch training
    • ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    • Training progress tracking with tqdm progress bars
    • GPU/CPU automatic device selection
  • Few-Shot Learning: Meta-learning for low-data scenarios
    • Prototypical Networks: Metric learning with prototype computation
      • Simple FC encoder (784→256→64 dimensions)
      • Episode-based sampling (5-way, 5-shot setup)
      • Euclidean distance metric in embedding space
    • MNIST Dataset with 80/20 train/test split
    • Elastic deformation augmentation for data diversity
  • One-Shot Learning: Siamese network for similarity learning
    • Siamese CNN: Twin network architecture for pair comparison
      • Convolutional feature extractor (Conv2D→ReLU→MaxPool2D×2)
      • Fully connected similarity head (128×4×4→256→128)
      • Pairwise distance computation with L2 norm
    • Proper tensor flattening for CNN-to-FC transition
  • Elastic Transform Augmentation: Advanced data augmentation
    • Corrected Implementation: Gaussian-smoothed displacement fields
    • Alpha scaling for deformation magnitude control
    • Sigma parameter for smoothness adjustment
    • scipy.ndimage.gaussian_filter for proper elastic deformation
    • cv2.remap with bilinear interpolation and reflection borders
    • Side-by-side visualization of original vs. deformed images
  • Model Evaluation Metrics: Comprehensive performance analysis
    • Accuracy, Precision, Recall, F1-score from sklearn.metrics
    • Weighted averaging for multi-class scenarios
    • Custom Siamese evaluation with pairwise similarity threshold
    • Confusion matrix support for detailed error analysis
  • Dual Mode Support: Compatible with local and Colab environments

Notebook: Stitching, Denoising, GANs & Segmentation Playground

  • File: Stitching_Denoising_GAN_SegmentationPlayground.ipynb
  • Image Stitching: Simple and panorama modes using OpenCV Stitcher; ORB-based matching with essential matrix pose recovery; visualization of keypoints and matches.
  • Inpainting: Mask-based OpenCV inpainting helper for quick cleanup of noisy regions.
  • Denoising Autoencoders (MNIST): Two variants (basic and improved with BN/Dropout) for noise+blur restoration; PSNR/SSIM metrics and pixel-wise accuracy.
  • GANs: MNIST MLP GAN and CIFAR-10 DCGAN with loss plots and image grid visualization utilities.
  • Transfer Learning: MobileNet V1/V2/V3 comparison for dog-breed classification (timm/models), with frozen backbone and classifier fine-tuning.
  • Utilities: Notebook metadata cleaning snippets (nbformat) to keep GitHub rendering healthy; Colab upload helpers for stitching inputs.

Notebook: Image Denoising & Video Action Pipeline

  • File: Image_Denoising_Video_Action_Pipeline.ipynb
  • Task 1: Image Denoising Comparison
    • Median Filter Denoising: Channel-wise morphological filtering with disk-shaped structuring elements (size=3), RGB channel processing, PSNR/SSIM/MSE metrics, matplotlib visualization
    • Wavelet Denoising: BayesShrink soft thresholding in wavelet domain, adaptive sigma rescaling, per-channel decomposition/reconstruction, quality metrics comparison
    • Deep Learning Denoising: Noise2Void-inspired U-Net (Conv2D→MaxPooling→UpSampling), self-supervised patch training (64×64), center pixel masking strategy, 10-epoch training
    • Multi-Method Comparison: Quantitative metrics (MSE, PSNR, SSIM) and visual side-by-side evaluation
  • Task 2: Video Processing & Action Recognition
    • Frame Extraction: Configurable interval sampling, sequential frame storage, progress tracking
    • Frame Visualization: 2×5 grid display with BGR→RGB conversion
    • Video Operations: Adaptive thresholding (GAUSSIAN_C), Gaussian blur (5×5), Canny edges (100/200), bitwise NOT inversion
    • Frame Collage: Grid-based spatial sampling, adaptive sizing, half-resolution assembly
    • UCF101 Dataset: 5-class subset (Basketball, Biking, PlayingGuitar, Typing, JumpRope), 10 videos/class, 224×224 frames, 16-frame sequences
    • 3D CNN Classification: 3-layer Conv3D (64→128→256 filters), batch normalization, 2×2×2 max pooling, 512-unit FC + 50% dropout, L2 regularization
    • Training Pipeline: Data augmentation (±30° rotation/shifts/shear/zoom, horizontal flip), class weight balancing, Adam optimizer (lr=0.0001), 100 epochs, ModelCheckpoint callback
    • Evaluation: Test accuracy, per-class precision/recall/F1, confusion matrix, training/validation curves
  • Dependencies: PyWavelets, TensorFlow/Keras, scikit-learn, OpenCV, matplotlib
  • Colab Ready: Full environment setup with pip installations

About

Advanced Computer Vision & Deep Learning: Complete implementations of blob detection (LoG/DoG/DoH), image processing pipelines, edge detection, histogram analysis, DCT compression, semantic segmentation, YOLO/Faster R-CNN object detection, and CNN classifiers for MNIST/CIFAR/Fashion-MNIST using PyTorch, TensorFlow, and OpenCV

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published