🚗 CNN-Based Autonomous Navigation Training

End-to-End Deep Learning for Self-Driving Car Steering Prediction

Getting Started • Architecture • Models • Dataset • FAQ • Contributing

📋 Table of Contents

Overview
System Architecture
Sensors Used
Algorithm Description
Model Architectures
Dataset Structure
Getting Started
Training Pipeline
Results
Block Diagram
Frequently Asked Questions
Contributing
License

🎯 Overview

This project implements an end-to-end deep learning approach for autonomous vehicle navigation, specifically focusing on steering angle prediction from camera images. Inspired by NVIDIA's pioneering work on self-driving cars, this system learns to map raw pixel inputs directly to steering commands.

Key Features

✅ Multiple CNN Architectures - Custom models + Transfer Learning (ResNet, AlexNet)
✅ Comprehensive Data Augmentation - 10+ augmentation techniques for robust training
✅ Multi-Camera Support - Center, Left, and Right camera fusion
✅ PyTorch Implementation - Modern, efficient deep learning framework
✅ GPU Acceleration - CUDA support for fast training
✅ TorchScript Export - Production-ready model serialization

🏗 System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        AUTONOMOUS NAVIGATION SYSTEM                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌────────────┐  │
│  │   SENSORS   │───▶│ PREPROCESSING │───▶│  CNN MODEL  │───▶│  CONTROL   │  │
│  │  (Cameras)  │    │   PIPELINE    │    │  (PyTorch)  │    │  OUTPUT    │  │
│  └─────────────┘    └──────────────┘    └─────────────┘    └────────────┘  │
│        │                   │                   │                  │         │
│        ▼                   ▼                   ▼                  ▼         │
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌────────────┐  │
│  │ • Center    │    │ • Cropping   │    │ • Conv2D    │    │ • Steering │  │
│  │ • Left      │    │ • Resize     │    │ • BatchNorm │    │   Angle    │  │
│  │ • Right     │    │ • Colorspace │    │ • Pooling   │    │ (-1 to +1) │  │
│  │             │    │ • Normalize  │    │ • FC Layers │    │            │  │
│  └─────────────┘    └──────────────┘    └─────────────┘    └────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

📷 Sensors Used

Camera System

Sensor	Position	Purpose	Resolution
Center Camera	Front-center dashboard	Primary driving view	Variable
Left Camera	Front-left	Recovery training (steering right)	Variable
Right Camera	Front-right	Recovery training (steering left)	Variable

Camera Configuration

# Camera offset correction for multi-camera training
LEFT_CAMERA_CORRECTION = +0.2   # Steer right to recover
RIGHT_CAMERA_CORRECTION = -0.2  # Steer left to recover

Why Multi-Camera?

The multi-camera setup enables recovery learning - teaching the model how to recover when the vehicle drifts from the center of the lane. This is crucial for robust autonomous navigation.

🧠 Algorithm Description

End-to-End Learning Approach

This project uses behavioral cloning - a supervised learning technique where:

Human Expert drives the vehicle in simulation
Camera Images are captured along with steering angles
CNN Model learns the mapping: Image → Steering Angle
Trained Model predicts steering for new, unseen images

Data Pipeline

Raw Image (Variable Size)
        ↓
   Crop ROI (Remove sky/hood)
        ↓
   Resize to 200×66 (NVIDIA format)
        ↓
   Color Conversion (RGB → YUV)
        ↓
   Gaussian Blur (Noise reduction)
        ↓
   Normalize to [-1, 1]
        ↓
   CNN Forward Pass
        ↓
   Steering Angle Output

Augmentation Strategy

To improve model generalization, we apply extensive data augmentation:

Augmentation	Description	Probability
Random Flip	Horizontal mirror + negate steering	50%
Random Shift	Translation in X/Y with steering correction	100%
Random Rotation	±5° rotation	100%
Random Shadow	Simulated shadow overlay	50%
Random Brightness	Brightness variation (0.25-1.25x)	100%
Gaussian Blur	Random kernel size blur	30%
Contrast Adjustment	Random contrast (0.5-1.5x)	30%
Gaussian Noise	Additive noise (μ=0, σ=10)	30%
Channel Dropout	Zero-out random color channel	20%

🔧 Model Architectures

Model 1: Custom CNN

A lightweight custom architecture optimized for steering prediction.

Input: 3×66×200
    ↓
Conv2D(32, 7×7) → BatchNorm → ELU → MaxPool
    ↓
Conv2D(64, 5×5) → BatchNorm → ELU → MaxPool
    ↓
Conv2D(64, 3×3) → BatchNorm → ELU → MaxPool
    ↓
Conv2D(128, 3×3) → BatchNorm → ELU → MaxPool
    ↓
Flatten → FC(512) → Dropout(0.5) → ELU
    ↓
FC(124) → Dropout(0.5) → ELU
    ↓
FC(32) → ELU → FC(10) → ELU → FC(1)
    ↓
Output: Steering Angle

Model 2: NVIDIA-Inspired Architecture

Based on NVIDIA's paper "End to End Learning for Self-Driving Cars".

Input: 3×66×200
    ↓
Conv2D(24, 5×5, stride=2) → BatchNorm → LeakyReLU
    ↓
Conv2D(36, 5×5, stride=2) → BatchNorm → LeakyReLU
    ↓
Conv2D(48, 5×5, stride=2) → BatchNorm → LeakyReLU
    ↓
Conv2D(64, 3×3) → BatchNorm → LeakyReLU
    ↓
Conv2D(64, 3×3) → BatchNorm → LeakyReLU
    ↓
AdaptiveAvgPool → LayerNorm
    ↓
FC(100) → Dropout(0.4) → FC(50) → Dropout(0.3) → FC(10) → FC(1)
    ↓
Output: Steering Angle

Model 3: Efficient Architecture with Residual Blocks

A modern architecture using Swish activation and separable convolutions.

Input: 3×66×200
    ↓
Conv2D(32, 3×3, stride=2) → BatchNorm → Swish
    ↓
SeparableConv(64) → BatchNorm → Swish
    ↓
SeparableConv(128) → BatchNorm → Swish
    ↓
ResidualBlock(128) × 2
    ↓
AdaptiveAvgPool(1×1) → LayerNorm
    ↓
FC(64) → Swish → Dropout(0.4) → FC(1)
    ↓
Output: Steering Angle

Transfer Learning Models

ResNet18: Pretrained ImageNet backbone with custom regression head
AlexNet: Pretrained classifier adapted for steering regression

📁 Dataset Structure

dataset/
├── IMG/
│   ├── center_2024_01_01_00_00_00_000.jpg
│   ├── left_2024_01_01_00_00_00_000.jpg
│   ├── right_2024_01_01_00_00_00_000.jpg
│   └── ...
└── dataset.csv

CSV Format

center	left	right	steering
dataset/IMG/center_*.jpg	dataset/IMG/left_*.jpg	dataset/IMG/right_*.jpg	0.0
dataset/IMG/center_*.jpg	dataset/IMG/left_*.jpg	dataset/IMG/right_*.jpg	0.15
...	...	...	...

🚀 Getting Started

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended)
8GB+ RAM

Installation

Clone the repository

git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git
cd CNN_based_Autonomous_Navigation_training

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Prepare dataset

# Place your dataset in the following structure:
# dataset/IMG/*.jpg
# dataset/dataset.csv

Run training

jupyter notebook simulation_cnn_model_pytorch_3.ipynb

🔄 Training Pipeline

Training Parameters

batch_size = 64
samples_per_epoch = 1000
nb_epoch = 40
learning_rate = 1e-4
optimizer = Adam
loss_function = MSELoss

Training Process

Data Loading: Images are loaded with multi-camera random selection
Augmentation: Real-time augmentation applied during training
Forward Pass: Image → CNN → Steering prediction
Loss Calculation: MSE between predicted and ground truth
Backpropagation: Gradient descent optimization
Validation: Model evaluated on held-out test set
Checkpointing: Best model saved based on validation loss

Model Export

Models are exported using TorchScript for deployment:

scripted_model = torch.jit.script(model)
scripted_model.save("model.pt")

📊 Results

Training produces loss curves showing model convergence. Lower validation loss indicates better generalization.

Model	Parameters	Best Val Loss	Training Time*
Model 1	~1.2M	TBD	~30 min
Model 2	~800K	TBD	~25 min
Model 3	~500K	TBD	~20 min
ResNet18	~11M	TBD	~45 min
AlexNet	~60M	TBD	~60 min

*Training time on NVIDIA GTX 1080 Ti

📐 Block Diagram

High-Level System Flow

┌──────────────────────────────────────────────────────────────────────────────┐
│                           DATA COLLECTION PHASE                               │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│    ┌──────────┐        ┌─────────────┐        ┌──────────────┐              │
│    │ Driving  │───────▶│   Cameras   │───────▶│  Recording   │              │
│    │ Simulator│        │ (L/C/R)     │        │  Images+CSV  │              │
│    └──────────┘        └─────────────┘        └──────────────┘              │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                            TRAINING PHASE                                     │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────┐    ┌────────────┐    ┌─────────┐    ┌────────────┐            │
│  │ Dataset │───▶│ DataLoader │───▶│  Model  │───▶│    Loss    │            │
│  │  (CSV)  │    │ + Augment  │    │   CNN   │    │ (MSE Loss) │            │
│  └─────────┘    └────────────┘    └─────────┘    └────────────┘            │
│                                         ▲              │                    │
│                                         │              ▼                    │
│                                    ┌─────────┐    ┌────────────┐            │
│                                    │ Updated │◀───│ Optimizer  │            │
│                                    │ Weights │    │   (Adam)   │            │
│                                    └─────────┘    └────────────┘            │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           INFERENCE PHASE                                     │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────┐    ┌────────────┐    ┌───────────┐    ┌───────────────┐        │
│  │ Camera  │───▶│ Preprocess │───▶│  Trained  │───▶│   Steering    │        │
│  │  Frame  │    │  Pipeline  │    │   Model   │    │   Command     │        │
│  └─────────┘    └────────────┘    └───────────┘    └───────────────┘        │
│                                                            │                 │
│                                                            ▼                 │
│                                                    ┌───────────────┐        │
│                                                    │   Vehicle     │        │
│                                                    │   Actuator    │        │
│                                                    └───────────────┘        │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

CNN Architecture Diagram

┌────────────────────────────────────────────────────────────────────────────┐
│                         CNN FEATURE EXTRACTION                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  INPUT IMAGE                CONVOLUTIONAL LAYERS           FULLY CONNECTED │
│  ┌─────────┐                                                               │
│  │░░░░░░░░░│   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌───────┐    ┌───┐        │
│  │░░░░░░░░░│──▶│C1 │──▶│C2 │──▶│C3 │──▶│C4 │──▶│Flatten│───▶│FC │──▶ θ   │
│  │░░░░░░░░░│   │32 │   │64 │   │64 │   │128│   │       │    │   │        │
│  │░░░░░░░░░│   └───┘   └───┘   └───┘   └───┘   └───────┘    └───┘        │
│  └─────────┘     ↓       ↓       ↓       ↓                                │
│   3×66×200      BN      BN      BN      BN           Dropout + ReLU       │
│                Pool    Pool    Pool    Pool                                │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Legend:
  C1-C4 = Convolutional Layers
  BN    = Batch Normalization
  Pool  = Max Pooling
  FC    = Fully Connected
  θ     = Steering Angle Output

❓ Frequently Asked Questions

General Questions

Q: What is end-to-end learning for autonomous driving?

A: End-to-end learning means the neural network learns to map directly from raw sensor inputs (camera images) to control outputs (steering angle) without explicit intermediate representations like lane detection or path planning. The model learns the entire perception-to-action pipeline in a single network.

Q: Why use CNNs for steering prediction?

A: CNNs are excellent at extracting spatial features from images. They can learn hierarchical representations - from edges and textures to complex patterns like road boundaries, lane markings, and environmental cues - that are essential for determining the correct steering angle.

Q: What simulation was used for data collection?

A: This project uses the Udacity Self-Driving Car Simulator, which provides a realistic driving environment with multiple camera views and steering angle recording.

Technical Questions

Q: Why crop the image before processing?

A: The top portion of the image contains sky and scenery that doesn't help with steering decisions. The bottom contains the car hood. Cropping removes these irrelevant regions, allowing the model to focus on the road and lane markings.

Q: Why convert RGB to YUV color space?

A: YUV separates luminance (Y) from chrominance (U, V), which can help the model be more robust to lighting variations. This follows NVIDIA's original approach in their self-driving car paper.

Q: Why use MSE loss instead of other loss functions?

A: MSE (Mean Squared Error) is a natural choice for regression problems where we want to minimize the squared difference between predicted and actual steering angles. It penalizes larger errors more heavily, which is desirable for smooth steering predictions.

Q: How do you handle the imbalanced steering angle distribution?

A: The dataset typically has more straight driving (steering ≈ 0) than turns. We address this through:

Random camera selection (left/center/right) with steering correction
Data augmentation (flipping, shifting, rotation)
Not oversampling straight segments

Q: Can I use this model for real-world deployment?

A: This model is trained on simulation data and is intended for research/educational purposes. Real-world deployment would require:

Training on real-world data
Extensive safety testing
Additional sensors (LIDAR, radar)
Redundancy systems
Regulatory compliance

Training Questions

Q: How long does training take?

A: Training time depends on your hardware:

With GPU (GTX 1080 Ti or better): ~20-45 minutes per model
Without GPU: Several hours per model (not recommended)

Q: What if I get CUDA out of memory errors?

A: Try:

Reducing batch_size (e.g., from 64 to 32)
Using a smaller model (Model 3 is most efficient)
Ensuring no other GPU processes are running

Q: How do I know if my model is overfitting?

A: Watch the training and validation loss curves:

If training loss keeps decreasing but validation loss starts increasing, you're overfitting
Use early stopping based on validation loss
Increase dropout or add more augmentation

🤝 Contributing

We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements, your help is appreciated.

How to Contribute

Fork the repository

# Click the 'Fork' button on GitHub

Clone your fork

git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git

Create a feature branch

git checkout -b feature/your-feature-name

Make your changes

# Edit files, add features, fix bugs

Commit your changes

git add .
git commit -m "Add: Description of your changes"

Push to your fork

git push origin feature/your-feature-name

Create a Pull Request

# Go to GitHub and click 'New Pull Request'

Contribution Ideas

🐛 Bug Fixes: Found a bug? Fix it!
📚 Documentation: Improve README, add docstrings, create tutorials
🧪 Testing: Add unit tests, integration tests
🎨 New Models: Implement additional architectures (EfficientNet, Vision Transformers)
📊 Visualization: Add training dashboards, prediction visualizations
🚀 Performance: Optimize inference speed, reduce memory usage
📦 Deployment: Add Docker support, TensorRT conversion

Code Style

Follow PEP 8 for Python code
Use meaningful variable names
Add docstrings to functions and classes
Keep functions focused and small

Commit Message Format

Type: Short description (max 50 chars)

Longer description if needed...

Types: Add, Fix, Update, Remove, Refactor, Document

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

NVIDIA's End-to-End Learning Paper - Original inspiration
Udacity Self-Driving Car Simulator
PyTorch team for the excellent deep learning framework

📞 Contact

For questions, suggestions, or collaboration opportunities:

GitHub Issues: Create an Issue
Pull Requests: Always welcome!

⭐ Star this repository if you found it helpful! ⭐

Made with ❤️ for the autonomous driving community

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
simulation_cnn_model_pytorch_3.ipynb		simulation_cnn_model_pytorch_3.ipynb

License

ThulzDin/CNN_based_Autonomous_Navigation_training

Folders and files

Latest commit

History

Repository files navigation