A comprehensive implementation of advanced computer vision techniques and video analytics solutions using Python, OpenCV, and deep learning frameworks.
- Overview
- Features
- Installation
- Project Structure
- Modules & Implementations
- Notebook: Image Processing & Digits Classification
- Notebook: Shape and Image Transformations
- Notebook: Edge Detection and Image Segmentation
- Notebook: Histogram Analysis, Equalization & DFT
- Notebook: Image Compression & Deep Learning Classification
- Notebook: Segmentation, Detection & Classification
- Notebook: Blob Detection, Image Enhancement & Classification
- Notebook: SIFT, ORB, Watershed, ResNet & Few-Shot Learning
- Notebook: Stitching, Denoising, GANs & Segmentation Playground
- Notebook: Image Denoising & Video Action Pipeline
This repository contains implementations of advanced computer vision algorithms and video analytics techniques including:
- Image transformations and geometric operations
- Object detection and tracking
- Face recognition and analysis
- Pose estimation
- Video processing and frame analysis
- Motion detection and activity recognition
- Semantic and instance segmentation
- 3D reconstruction
✨ Image Processing
- Geometric transformations (rotation, scaling, translation, shearing, reflection)
- Image filtering and enhancement
- Morphological operations
- Edge detection and contour analysis
✨ Video Analytics
- Real-time video processing
- Frame-level analysis
- Motion tracking
- Temporal analysis
✨ Deep Learning Integration
- Pre-trained model support (YOLO, ResNet, etc.)
- Custom model implementations
- Transfer learning examples
✨ Visualization Tools
- Matplotlib-based visualization
- Real-time plotting
- Annotated frame display
- Python 3.8+
- pip or conda
git clone https://github.com/anshika1279/Computer-Vision-Implementation.git
cd Computer-Vision-Implementation
pip install -r requirements.txtKey runtime dependencies include:
- Notebook tooling: nbformat, nbconvert, notebook (for cleaning/fixing metadata)
- CV/DL extras: timm (MobileNetV1), ultralytics, torch/torchvision, tensorflow
- README.md: Project overview and instructions.
- requirements.txt: Python dependencies for all notebooks.
- image_processing_and_digits_classification.ipynb: Image resizing/blur demo plus digits classification with multiple models.
- ShapeAndImageTransformations.ipynb: Shape and image transformation examples.
- edge_detection_and_image_segmentation.ipynb: Edge detection operators and image segmentation techniques.
- Histogram_Analysis_Equalization_DFT.ipynb: Histogram analysis, contrast enhancement, and frequency domain transformations.
- image_compression_techniques_DCT_Deep_learning_image_classification.ipynb: DCT compression and CNN-based digit/object classification.
- segmentation_detection_classification.ipynb: Advanced CV pipeline with edge/region segmentation, Hough transform, YOLO/R-CNN detection, and Fashion-MNIST/CIFAR-100 classification.
- blob_detection_image_enhancement_classification.ipynb: Blob detection algorithms (LoG, DoG, DoH), comprehensive image enhancement techniques, and transfer learning with AlexNet/VGG16 on CIFAR-100.
- sift_orb_watershed_resnet_few_shot_learning.ipynb: Feature detection (SIFT, ORB), feature matching (BFMatcher), watershed segmentation, ResNet-18/34 classification on CIFAR-100, and few-shot/one-shot learning with elastic deformation augmentation.
- Stitching_Denoising_GAN_SegmentationPlayground.ipynb: Colab-ready playground covering image stitching (simple/panorama/ORB + pose), inpainting, MNIST denoising autoencoders, GANs (MNIST, CIFAR-10), MobileNet V1/V2/V3 fine-tuning, and notebook metadata fixes for GitHub rendering.
- Image_Denoising_Video_Action_Pipeline.ipynb: Image denoising comparison (Median, Wavelet, Noise2Void U-Net) and video action recognition pipeline (frame extraction, video processing, UCF101 subset with 3D CNN classification).
- Image resizing with multiple interpolation methods and blurring with box, Gaussian, and bilateral filters.
- Digits classification using sklearn digits dataset with Gaussian Naive Bayes, RBF SVM, and Random Forest, including cross-validation and ROC visualization.
- File: image_processing_and_digits_classification.ipynb
- Part 1: Resize (linear, nearest, cubic) and blur (box, Gaussian, bilateral) a local image; expects image.png in the repo root and displays a comparison grid.
- Part 2: Train/evaluate classifiers (Gaussian Naive Bayes, RBF SVM, Random Forest) on sklearn digits with 5-fold CV; prints metrics and shows ROC curves.
- File: ShapeAndImageTransformations.ipynb
- Part 1: 2D rectangle transformations (translate, scale, rotate, reflect, shear, composite) visualized with Matplotlib.
- Part 2: Image transformations on
input.jpgusing OpenCV (translate, reflect, rotate, scale, crop, shear on x/y) with side-by-side plots. - Part 3: Additional 2D shape transformations with reusable helpers for translate/scale/rotate/reflect/shear and composite examples.
- File: edge_detection_and_image_segmentation.ipynb
- Edge Detection: Implements multiple edge detection operators including:
- Sobel operator (combined X and Y gradients)
- Prewitt edge detection
- Roberts cross operator
- Canny edge detector
- Image Segmentation: Demonstrates various segmentation techniques:
- Global thresholding (fixed threshold binarization)
- Adaptive thresholding (local neighborhood-based)
- Watershed segmentation with morphological operations
- Preprocessing: Includes color space conversions (BGR→RGB→Grayscale→Binary) and image metrics calculation
- Visualization: Displays all results in a comprehensive grid layout with labeled subplots
- Outputs: Saves processed images (edge maps, segmented regions) for further analysis
- File: Histogram_Analysis_Equalization_DFT.ipynb
- Histogram Analysis: Computes and plots histograms for both grayscale and color (RGB) images
- Individual channel histograms for color images (B, G, R)
- Histogram normalization to probability distributions
- Visualization with matplotlib for histogram analysis
- Contrast Enhancement: Implements histogram equalization for improving image contrast
- Before/after comparison of grayscale images
- Visual quality assessment with text annotations
- Side-by-side display of original and equalized images
- Discrete Fourier Transform (DFT): Frequency domain analysis and transformations
- DFT computation with magnitude spectrum visualization
- Inverse DFT for image reconstruction
- Rotation property verification (45° rotation test)
- Demonstrates spatial vs. frequency domain correspondence
- Compatible with Google Colab: Uses
cv2_imshowfor Colab environments
- File: image_compression_techniques_DCT_Deep_learning_image_classification .ipynb
- DCT-Based Image Compression: Implements both lossy and lossless compression techniques
- Lossy compression with quantization using JPEG standard quantization matrix
- Lossless compression preserving all DCT coefficients
- Block-wise DCT/IDCT operations (8×8 blocks)
- Compression ratio analysis and file size comparison
- MNIST Digit Classification: CNN implementations for handwritten digit recognition
- Basic 3-layer CNN architecture
- Enhanced CNN with BatchNormalization, Dropout, and L2 regularization
- Data augmentation (rotation, shifts, zoom)
- Learning rate scheduling and early stopping
- CIFAR-10 Classification: Color image classification with CNN
- 10-class object recognition on 32×32 color images
- Similar architecture adapted for RGB inputs
- Model Evaluation: Comprehensive performance metrics
- Classification reports with precision, recall, F1-score
- Confusion matrices with heatmap visualization
- ROC curves and AUC scores
- File: segmentation_detection_classification.ipynb
- Image Segmentation: Multiple segmentation approaches
- Edge-based segmentation using Canny edge detection
- Region-based segmentation with thresholding techniques
- Visualization with matplotlib for result comparison
- Hough Transform: Line detection and feature extraction
- Probabilistic Hough Line Transform for straight line detection
- Configurable parameters for line detection sensitivity
- Visual overlay of detected lines on original images
- Object Detection: State-of-the-art detection models
- YOLOv8: Real-time object detection with ultralytics framework
- Faster R-CNN: Region-based detection with ResNet50-FPN backbone
- Bounding box visualization with confidence scores
- Pre-trained models on COCO dataset for 80+ object classes
- Deep Learning Classification: Multi-dataset CNN training
- Fashion-MNIST: Clothing classification (10 classes, 28×28 grayscale)
- CIFAR-100: Fine-grained object classification (100 classes, 32×32 RGB)
- Custom CNN architectures with Conv2D, MaxPooling, and Dense layers
- 5-epoch training with validation accuracy tracking
- Classification reports with precision, recall, and F1-scores
- Integrated Pipeline: End-to-end processing workflow combining segmentation, detection, and classification
- Dual Environment Support: Compatible with both local (matplotlib) and Google Colab (cv2_imshow) environments
- File: blob_detection_image_enhancement_classification.ipynb
- Blob Detection Algorithms: Implementation of three advanced blob detection methods
- LoG (Laplacian of Gaussian): Scale-space blob detection with adjustable sigma parameters
- DoG (Difference of Gaussian): Efficient approximation of LoG for faster computation
- DoH (Determinant of Hessian): Hessian matrix-based blob detection for feature localization
- Purple region extraction using HSV color masking
- Morphological preprocessing pipeline (erosion, dilation, opening, closing, area operations)
- Handles RGBA images with alpha channel conversion
- Red circle overlay visualization of detected blobs
- Image Enhancement Pipeline: Eight comprehensive image processing techniques
- Brightness & Contrast adjustment with alpha/beta parameters
- Image sharpening using custom convolution kernels
- Denoising with Non-Local Means algorithm
- Color enhancement using PIL ImageEnhance
- Image resizing with interpolation
- Inverse transform (bitwise NOT operation)
- Histogram equalization (grayscale and color via YCrCb)
- LAB color space-based color correction
- Grid visualization (3×3 layout) of all enhancement results
- Transfer Learning Classification: CIFAR-100 fine-grained classification
- AlexNet: Pre-trained on ImageNet, fine-tuned for 100 classes
- VGG16: Deep architecture with 16 layers, adapted for CIFAR-100
- Modified final classifier layers for 100-class output
- SGD optimizer with momentum (lr=0.0001, momentum=0.9)
- Cross-entropy loss function
- Training loop with tqdm progress bars
- Batch size optimized for memory efficiency (batch_size=16)
- Automatic device selection (CUDA/MPS/CPU)
- ImageNet normalization for transfer learning compatibility
- Model Evaluation: Comprehensive accuracy metrics on CIFAR-100 test set
- Multi-Image Processing: Batch processing across multiple test images (p1.jpg, p2.jpg, p3.png, p4.png, p5.jpg)
- File: sift_orb_watershed_resnet_few_shot_learning.ipynb
- Feature Detection & Description: Classical computer vision feature extraction
- SIFT (Scale-Invariant Feature Transform): Keypoint detection and descriptor computation
- ORB (Oriented FAST and Rotated BRIEF): Fast binary descriptor for real-time applications
- Rich keypoint visualization with OpenCV drawing flags
- Multiple test images for robustness evaluation
- Feature Matching: Correspondence finding between image pairs
- BFMatcher (Brute-Force Matcher): Exhaustive descriptor matching
- Hamming distance for ORB binary descriptors
- L2 distance for SIFT float descriptors
- Cross-check validation for bidirectional matching
- Top-50 matches visualization with distance-based sorting
- Watershed Segmentation: Marker-based region segmentation
- Binary thresholding (THRESH_BINARY_INV) for preprocessing
- Custom marker seeds for background/foreground separation
- Watershed algorithm with boundary highlighting (red contours)
- Contour detection with RETR_EXTERNAL mode
- Multi-stage visualization (original → watershed → contours)
- Deep Learning Classification: CIFAR-100 fine-grained recognition
- ResNet-18: Residual network with 18 layers (ImageNet pre-trained)
- ResNet-34: Deeper 34-layer residual architecture
- Transfer learning with modified final FC layer (100 classes)
- SGD optimizer with momentum (lr=0.0001, momentum=0.9)
- Cross-entropy loss with 5-epoch training
- ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Training progress tracking with tqdm progress bars
- GPU/CPU automatic device selection
- Few-Shot Learning: Meta-learning for low-data scenarios
- Prototypical Networks: Metric learning with prototype computation
- Simple FC encoder (784→256→64 dimensions)
- Episode-based sampling (5-way, 5-shot setup)
- Euclidean distance metric in embedding space
- MNIST Dataset with 80/20 train/test split
- Elastic deformation augmentation for data diversity
- Prototypical Networks: Metric learning with prototype computation
- One-Shot Learning: Siamese network for similarity learning
- Siamese CNN: Twin network architecture for pair comparison
- Convolutional feature extractor (Conv2D→ReLU→MaxPool2D×2)
- Fully connected similarity head (128×4×4→256→128)
- Pairwise distance computation with L2 norm
- Proper tensor flattening for CNN-to-FC transition
- Siamese CNN: Twin network architecture for pair comparison
- Elastic Transform Augmentation: Advanced data augmentation
- Corrected Implementation: Gaussian-smoothed displacement fields
- Alpha scaling for deformation magnitude control
- Sigma parameter for smoothness adjustment
- scipy.ndimage.gaussian_filter for proper elastic deformation
- cv2.remap with bilinear interpolation and reflection borders
- Side-by-side visualization of original vs. deformed images
- Model Evaluation Metrics: Comprehensive performance analysis
- Accuracy, Precision, Recall, F1-score from sklearn.metrics
- Weighted averaging for multi-class scenarios
- Custom Siamese evaluation with pairwise similarity threshold
- Confusion matrix support for detailed error analysis
- Dual Mode Support: Compatible with local and Colab environments
- File: Stitching_Denoising_GAN_SegmentationPlayground.ipynb
- Image Stitching: Simple and panorama modes using OpenCV Stitcher; ORB-based matching with essential matrix pose recovery; visualization of keypoints and matches.
- Inpainting: Mask-based OpenCV inpainting helper for quick cleanup of noisy regions.
- Denoising Autoencoders (MNIST): Two variants (basic and improved with BN/Dropout) for noise+blur restoration; PSNR/SSIM metrics and pixel-wise accuracy.
- GANs: MNIST MLP GAN and CIFAR-10 DCGAN with loss plots and image grid visualization utilities.
- Transfer Learning: MobileNet V1/V2/V3 comparison for dog-breed classification (timm/models), with frozen backbone and classifier fine-tuning.
- Utilities: Notebook metadata cleaning snippets (nbformat) to keep GitHub rendering healthy; Colab upload helpers for stitching inputs.
- File: Image_Denoising_Video_Action_Pipeline.ipynb
- Task 1: Image Denoising Comparison
- Median Filter Denoising: Channel-wise morphological filtering with disk-shaped structuring elements (size=3), RGB channel processing, PSNR/SSIM/MSE metrics, matplotlib visualization
- Wavelet Denoising: BayesShrink soft thresholding in wavelet domain, adaptive sigma rescaling, per-channel decomposition/reconstruction, quality metrics comparison
- Deep Learning Denoising: Noise2Void-inspired U-Net (Conv2D→MaxPooling→UpSampling), self-supervised patch training (64×64), center pixel masking strategy, 10-epoch training
- Multi-Method Comparison: Quantitative metrics (MSE, PSNR, SSIM) and visual side-by-side evaluation
- Task 2: Video Processing & Action Recognition
- Frame Extraction: Configurable interval sampling, sequential frame storage, progress tracking
- Frame Visualization: 2×5 grid display with BGR→RGB conversion
- Video Operations: Adaptive thresholding (GAUSSIAN_C), Gaussian blur (5×5), Canny edges (100/200), bitwise NOT inversion
- Frame Collage: Grid-based spatial sampling, adaptive sizing, half-resolution assembly
- UCF101 Dataset: 5-class subset (Basketball, Biking, PlayingGuitar, Typing, JumpRope), 10 videos/class, 224×224 frames, 16-frame sequences
- 3D CNN Classification: 3-layer Conv3D (64→128→256 filters), batch normalization, 2×2×2 max pooling, 512-unit FC + 50% dropout, L2 regularization
- Training Pipeline: Data augmentation (±30° rotation/shifts/shear/zoom, horizontal flip), class weight balancing, Adam optimizer (lr=0.0001), 100 epochs, ModelCheckpoint callback
- Evaluation: Test accuracy, per-class precision/recall/F1, confusion matrix, training/validation curves
- Dependencies: PyWavelets, TensorFlow/Keras, scikit-learn, OpenCV, matplotlib
- Colab Ready: Full environment setup with pip installations