Skip to content

End-to-End CNN-based Autonomous Vehicle Steering Prediction using PyTorch

License

Notifications You must be signed in to change notification settings

ThulzDin/CNN_based_Autonomous_Navigation_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— CNN-Based Autonomous Navigation Training

Python PyTorch License Contributions Welcome

End-to-End Deep Learning for Self-Driving Car Steering Prediction

Getting Started β€’ Architecture β€’ Models β€’ Dataset β€’ FAQ β€’ Contributing


πŸ“‹ Table of Contents


🎯 Overview

This project implements an end-to-end deep learning approach for autonomous vehicle navigation, specifically focusing on steering angle prediction from camera images. Inspired by NVIDIA's pioneering work on self-driving cars, this system learns to map raw pixel inputs directly to steering commands.

Key Features

  • βœ… Multiple CNN Architectures - Custom models + Transfer Learning (ResNet, AlexNet)
  • βœ… Comprehensive Data Augmentation - 10+ augmentation techniques for robust training
  • βœ… Multi-Camera Support - Center, Left, and Right camera fusion
  • βœ… PyTorch Implementation - Modern, efficient deep learning framework
  • βœ… GPU Acceleration - CUDA support for fast training
  • βœ… TorchScript Export - Production-ready model serialization

πŸ— System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        AUTONOMOUS NAVIGATION SYSTEM                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   SENSORS   │───▢│ PREPROCESSING │───▢│  CNN MODEL  │───▢│  CONTROL   β”‚  β”‚
β”‚  β”‚  (Cameras)  β”‚    β”‚   PIPELINE    β”‚    β”‚  (PyTorch)  β”‚    β”‚  OUTPUT    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β”‚                   β”‚                   β”‚                  β”‚         β”‚
β”‚        β–Ό                   β–Ό                   β–Ό                  β–Ό         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ β€’ Center    β”‚    β”‚ β€’ Cropping   β”‚    β”‚ β€’ Conv2D    β”‚    β”‚ β€’ Steering β”‚  β”‚
β”‚  β”‚ β€’ Left      β”‚    β”‚ β€’ Resize     β”‚    β”‚ β€’ BatchNorm β”‚    β”‚   Angle    β”‚  β”‚
β”‚  β”‚ β€’ Right     β”‚    β”‚ β€’ Colorspace β”‚    β”‚ β€’ Pooling   β”‚    β”‚ (-1 to +1) β”‚  β”‚
β”‚  β”‚             β”‚    β”‚ β€’ Normalize  β”‚    β”‚ β€’ FC Layers β”‚    β”‚            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“· Sensors Used

Camera System

Sensor Position Purpose Resolution
Center Camera Front-center dashboard Primary driving view Variable
Left Camera Front-left Recovery training (steering right) Variable
Right Camera Front-right Recovery training (steering left) Variable

Camera Configuration

# Camera offset correction for multi-camera training
LEFT_CAMERA_CORRECTION = +0.2   # Steer right to recover
RIGHT_CAMERA_CORRECTION = -0.2  # Steer left to recover

Why Multi-Camera?

The multi-camera setup enables recovery learning - teaching the model how to recover when the vehicle drifts from the center of the lane. This is crucial for robust autonomous navigation.


🧠 Algorithm Description

End-to-End Learning Approach

This project uses behavioral cloning - a supervised learning technique where:

  1. Human Expert drives the vehicle in simulation
  2. Camera Images are captured along with steering angles
  3. CNN Model learns the mapping: Image β†’ Steering Angle
  4. Trained Model predicts steering for new, unseen images

Data Pipeline

Raw Image (Variable Size)
        ↓
   Crop ROI (Remove sky/hood)
        ↓
   Resize to 200Γ—66 (NVIDIA format)
        ↓
   Color Conversion (RGB β†’ YUV)
        ↓
   Gaussian Blur (Noise reduction)
        ↓
   Normalize to [-1, 1]
        ↓
   CNN Forward Pass
        ↓
   Steering Angle Output

Augmentation Strategy

To improve model generalization, we apply extensive data augmentation:

Augmentation Description Probability
Random Flip Horizontal mirror + negate steering 50%
Random Shift Translation in X/Y with steering correction 100%
Random Rotation Β±5Β° rotation 100%
Random Shadow Simulated shadow overlay 50%
Random Brightness Brightness variation (0.25-1.25x) 100%
Gaussian Blur Random kernel size blur 30%
Contrast Adjustment Random contrast (0.5-1.5x) 30%
Gaussian Noise Additive noise (ΞΌ=0, Οƒ=10) 30%
Channel Dropout Zero-out random color channel 20%

πŸ”§ Model Architectures

Model 1: Custom CNN

A lightweight custom architecture optimized for steering prediction.

Input: 3Γ—66Γ—200
    ↓
Conv2D(32, 7Γ—7) β†’ BatchNorm β†’ ELU β†’ MaxPool
    ↓
Conv2D(64, 5Γ—5) β†’ BatchNorm β†’ ELU β†’ MaxPool
    ↓
Conv2D(64, 3Γ—3) β†’ BatchNorm β†’ ELU β†’ MaxPool
    ↓
Conv2D(128, 3Γ—3) β†’ BatchNorm β†’ ELU β†’ MaxPool
    ↓
Flatten β†’ FC(512) β†’ Dropout(0.5) β†’ ELU
    ↓
FC(124) β†’ Dropout(0.5) β†’ ELU
    ↓
FC(32) β†’ ELU β†’ FC(10) β†’ ELU β†’ FC(1)
    ↓
Output: Steering Angle

Model 2: NVIDIA-Inspired Architecture

Based on NVIDIA's paper "End to End Learning for Self-Driving Cars".

Input: 3Γ—66Γ—200
    ↓
Conv2D(24, 5Γ—5, stride=2) β†’ BatchNorm β†’ LeakyReLU
    ↓
Conv2D(36, 5Γ—5, stride=2) β†’ BatchNorm β†’ LeakyReLU
    ↓
Conv2D(48, 5Γ—5, stride=2) β†’ BatchNorm β†’ LeakyReLU
    ↓
Conv2D(64, 3Γ—3) β†’ BatchNorm β†’ LeakyReLU
    ↓
Conv2D(64, 3Γ—3) β†’ BatchNorm β†’ LeakyReLU
    ↓
AdaptiveAvgPool β†’ LayerNorm
    ↓
FC(100) β†’ Dropout(0.4) β†’ FC(50) β†’ Dropout(0.3) β†’ FC(10) β†’ FC(1)
    ↓
Output: Steering Angle

Model 3: Efficient Architecture with Residual Blocks

A modern architecture using Swish activation and separable convolutions.

Input: 3Γ—66Γ—200
    ↓
Conv2D(32, 3Γ—3, stride=2) β†’ BatchNorm β†’ Swish
    ↓
SeparableConv(64) β†’ BatchNorm β†’ Swish
    ↓
SeparableConv(128) β†’ BatchNorm β†’ Swish
    ↓
ResidualBlock(128) Γ— 2
    ↓
AdaptiveAvgPool(1Γ—1) β†’ LayerNorm
    ↓
FC(64) β†’ Swish β†’ Dropout(0.4) β†’ FC(1)
    ↓
Output: Steering Angle

Transfer Learning Models

  • ResNet18: Pretrained ImageNet backbone with custom regression head
  • AlexNet: Pretrained classifier adapted for steering regression

πŸ“ Dataset Structure

dataset/
β”œβ”€β”€ IMG/
β”‚   β”œβ”€β”€ center_2024_01_01_00_00_00_000.jpg
β”‚   β”œβ”€β”€ left_2024_01_01_00_00_00_000.jpg
β”‚   β”œβ”€β”€ right_2024_01_01_00_00_00_000.jpg
β”‚   └── ...
└── dataset.csv

CSV Format

center left right steering
dataset/IMG/center_*.jpg dataset/IMG/left_*.jpg dataset/IMG/right_*.jpg 0.0
dataset/IMG/center_*.jpg dataset/IMG/left_*.jpg dataset/IMG/right_*.jpg 0.15
... ... ... ...

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended)
  • 8GB+ RAM

Installation

  1. Clone the repository
git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git
cd CNN_based_Autonomous_Navigation_training
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Prepare dataset
# Place your dataset in the following structure:
# dataset/IMG/*.jpg
# dataset/dataset.csv
  1. Run training
jupyter notebook simulation_cnn_model_pytorch_3.ipynb

πŸ”„ Training Pipeline

Training Parameters

batch_size = 64
samples_per_epoch = 1000
nb_epoch = 40
learning_rate = 1e-4
optimizer = Adam
loss_function = MSELoss

Training Process

  1. Data Loading: Images are loaded with multi-camera random selection
  2. Augmentation: Real-time augmentation applied during training
  3. Forward Pass: Image β†’ CNN β†’ Steering prediction
  4. Loss Calculation: MSE between predicted and ground truth
  5. Backpropagation: Gradient descent optimization
  6. Validation: Model evaluated on held-out test set
  7. Checkpointing: Best model saved based on validation loss

Model Export

Models are exported using TorchScript for deployment:

scripted_model = torch.jit.script(model)
scripted_model.save("model.pt")

πŸ“Š Results

Training produces loss curves showing model convergence. Lower validation loss indicates better generalization.

Model Parameters Best Val Loss Training Time*
Model 1 ~1.2M TBD ~30 min
Model 2 ~800K TBD ~25 min
Model 3 ~500K TBD ~20 min
ResNet18 ~11M TBD ~45 min
AlexNet ~60M TBD ~60 min

*Training time on NVIDIA GTX 1080 Ti


πŸ“ Block Diagram

High-Level System Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           DATA COLLECTION PHASE                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚    β”‚ Driving  │───────▢│   Cameras   │───────▢│  Recording   β”‚              β”‚
β”‚    β”‚ Simulatorβ”‚        β”‚ (L/C/R)     β”‚        β”‚  Images+CSV  β”‚              β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                            TRAINING PHASE                                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚ Dataset │───▢│ DataLoader │───▢│  Model  │───▢│    Loss    β”‚            β”‚
β”‚  β”‚  (CSV)  β”‚    β”‚ + Augment  β”‚    β”‚   CNN   β”‚    β”‚ (MSE Loss) β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                         β–²              β”‚                    β”‚
β”‚                                         β”‚              β–Ό                    β”‚
β”‚                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚                                    β”‚ Updated │◀───│ Optimizer  β”‚            β”‚
β”‚                                    β”‚ Weights β”‚    β”‚   (Adam)   β”‚            β”‚
β”‚                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           INFERENCE PHASE                                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚ Camera  │───▢│ Preprocess │───▢│  Trained  │───▢│   Steering    β”‚        β”‚
β”‚  β”‚  Frame  β”‚    β”‚  Pipeline  β”‚    β”‚   Model   β”‚    β”‚   Command     β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                            β”‚                 β”‚
β”‚                                                            β–Ό                 β”‚
β”‚                                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚                                                    β”‚   Vehicle     β”‚        β”‚
β”‚                                                    β”‚   Actuator    β”‚        β”‚
β”‚                                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CNN Architecture Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         CNN FEATURE EXTRACTION                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  INPUT IMAGE                CONVOLUTIONAL LAYERS           FULLY CONNECTED β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                               β”‚
β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”        β”‚
β”‚  │░░░░░░░░░│──▢│C1 │──▢│C2 │──▢│C3 │──▢│C4 │──▢│Flatten│───▢│FC │──▢ ΞΈ   β”‚
β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚   β”‚32 β”‚   β”‚64 β”‚   β”‚64 β”‚   β”‚128β”‚   β”‚       β”‚    β”‚   β”‚        β”‚
β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”˜        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     ↓       ↓       ↓       ↓                                β”‚
β”‚   3Γ—66Γ—200      BN      BN      BN      BN           Dropout + ReLU       β”‚
β”‚                Pool    Pool    Pool    Pool                                β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Legend:
  C1-C4 = Convolutional Layers
  BN    = Batch Normalization
  Pool  = Max Pooling
  FC    = Fully Connected
  ΞΈ     = Steering Angle Output

❓ Frequently Asked Questions

General Questions

Q: What is end-to-end learning for autonomous driving?

A: End-to-end learning means the neural network learns to map directly from raw sensor inputs (camera images) to control outputs (steering angle) without explicit intermediate representations like lane detection or path planning. The model learns the entire perception-to-action pipeline in a single network.

Q: Why use CNNs for steering prediction?

A: CNNs are excellent at extracting spatial features from images. They can learn hierarchical representations - from edges and textures to complex patterns like road boundaries, lane markings, and environmental cues - that are essential for determining the correct steering angle.

Q: What simulation was used for data collection?

A: This project uses the Udacity Self-Driving Car Simulator, which provides a realistic driving environment with multiple camera views and steering angle recording.

Technical Questions

Q: Why crop the image before processing?

A: The top portion of the image contains sky and scenery that doesn't help with steering decisions. The bottom contains the car hood. Cropping removes these irrelevant regions, allowing the model to focus on the road and lane markings.

Q: Why convert RGB to YUV color space?

A: YUV separates luminance (Y) from chrominance (U, V), which can help the model be more robust to lighting variations. This follows NVIDIA's original approach in their self-driving car paper.

Q: Why use MSE loss instead of other loss functions?

A: MSE (Mean Squared Error) is a natural choice for regression problems where we want to minimize the squared difference between predicted and actual steering angles. It penalizes larger errors more heavily, which is desirable for smooth steering predictions.

Q: How do you handle the imbalanced steering angle distribution?

A: The dataset typically has more straight driving (steering β‰ˆ 0) than turns. We address this through:

  • Random camera selection (left/center/right) with steering correction
  • Data augmentation (flipping, shifting, rotation)
  • Not oversampling straight segments
Q: Can I use this model for real-world deployment?

A: This model is trained on simulation data and is intended for research/educational purposes. Real-world deployment would require:

  • Training on real-world data
  • Extensive safety testing
  • Additional sensors (LIDAR, radar)
  • Redundancy systems
  • Regulatory compliance

Training Questions

Q: How long does training take?

A: Training time depends on your hardware:

  • With GPU (GTX 1080 Ti or better): ~20-45 minutes per model
  • Without GPU: Several hours per model (not recommended)
Q: What if I get CUDA out of memory errors?

A: Try:

  • Reducing batch_size (e.g., from 64 to 32)
  • Using a smaller model (Model 3 is most efficient)
  • Ensuring no other GPU processes are running
Q: How do I know if my model is overfitting?

A: Watch the training and validation loss curves:

  • If training loss keeps decreasing but validation loss starts increasing, you're overfitting
  • Use early stopping based on validation loss
  • Increase dropout or add more augmentation

🀝 Contributing

We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements, your help is appreciated.

How to Contribute

  1. Fork the repository
# Click the 'Fork' button on GitHub
  1. Clone your fork
git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git
  1. Create a feature branch
git checkout -b feature/your-feature-name
  1. Make your changes
# Edit files, add features, fix bugs
  1. Commit your changes
git add .
git commit -m "Add: Description of your changes"
  1. Push to your fork
git push origin feature/your-feature-name
  1. Create a Pull Request
# Go to GitHub and click 'New Pull Request'

Contribution Ideas

  • πŸ› Bug Fixes: Found a bug? Fix it!
  • πŸ“š Documentation: Improve README, add docstrings, create tutorials
  • πŸ§ͺ Testing: Add unit tests, integration tests
  • 🎨 New Models: Implement additional architectures (EfficientNet, Vision Transformers)
  • πŸ“Š Visualization: Add training dashboards, prediction visualizations
  • πŸš€ Performance: Optimize inference speed, reduce memory usage
  • πŸ“¦ Deployment: Add Docker support, TensorRT conversion

Code Style

  • Follow PEP 8 for Python code
  • Use meaningful variable names
  • Add docstrings to functions and classes
  • Keep functions focused and small

Commit Message Format

Type: Short description (max 50 chars)

Longer description if needed...

Types: Add, Fix, Update, Remove, Refactor, Document

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


πŸ“ž Contact

For questions, suggestions, or collaboration opportunities:


⭐ Star this repository if you found it helpful! ⭐

Made with ❀️ for the autonomous driving community

About

End-to-End CNN-based Autonomous Vehicle Steering Prediction using PyTorch

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published