End-to-End Deep Learning for Self-Driving Car Steering Prediction
Getting Started β’ Architecture β’ Models β’ Dataset β’ FAQ β’ Contributing
- Overview
- System Architecture
- Sensors Used
- Algorithm Description
- Model Architectures
- Dataset Structure
- Getting Started
- Training Pipeline
- Results
- Block Diagram
- Frequently Asked Questions
- Contributing
- License
This project implements an end-to-end deep learning approach for autonomous vehicle navigation, specifically focusing on steering angle prediction from camera images. Inspired by NVIDIA's pioneering work on self-driving cars, this system learns to map raw pixel inputs directly to steering commands.
- β Multiple CNN Architectures - Custom models + Transfer Learning (ResNet, AlexNet)
- β Comprehensive Data Augmentation - 10+ augmentation techniques for robust training
- β Multi-Camera Support - Center, Left, and Right camera fusion
- β PyTorch Implementation - Modern, efficient deep learning framework
- β GPU Acceleration - CUDA support for fast training
- β TorchScript Export - Production-ready model serialization
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTONOMOUS NAVIGATION SYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββ β
β β SENSORS βββββΆβ PREPROCESSING βββββΆβ CNN MODEL βββββΆβ CONTROL β β
β β (Cameras) β β PIPELINE β β (PyTorch) β β OUTPUT β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββ β
β β β β β β
β βΌ βΌ βΌ βΌ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββ β
β β β’ Center β β β’ Cropping β β β’ Conv2D β β β’ Steering β β
β β β’ Left β β β’ Resize β β β’ BatchNorm β β Angle β β
β β β’ Right β β β’ Colorspace β β β’ Pooling β β (-1 to +1) β β
β β β β β’ Normalize β β β’ FC Layers β β β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Sensor | Position | Purpose | Resolution |
|---|---|---|---|
| Center Camera | Front-center dashboard | Primary driving view | Variable |
| Left Camera | Front-left | Recovery training (steering right) | Variable |
| Right Camera | Front-right | Recovery training (steering left) | Variable |
# Camera offset correction for multi-camera training
LEFT_CAMERA_CORRECTION = +0.2 # Steer right to recover
RIGHT_CAMERA_CORRECTION = -0.2 # Steer left to recoverThe multi-camera setup enables recovery learning - teaching the model how to recover when the vehicle drifts from the center of the lane. This is crucial for robust autonomous navigation.
This project uses behavioral cloning - a supervised learning technique where:
- Human Expert drives the vehicle in simulation
- Camera Images are captured along with steering angles
- CNN Model learns the mapping:
Image β Steering Angle - Trained Model predicts steering for new, unseen images
Raw Image (Variable Size)
β
Crop ROI (Remove sky/hood)
β
Resize to 200Γ66 (NVIDIA format)
β
Color Conversion (RGB β YUV)
β
Gaussian Blur (Noise reduction)
β
Normalize to [-1, 1]
β
CNN Forward Pass
β
Steering Angle Output
To improve model generalization, we apply extensive data augmentation:
| Augmentation | Description | Probability |
|---|---|---|
| Random Flip | Horizontal mirror + negate steering | 50% |
| Random Shift | Translation in X/Y with steering correction | 100% |
| Random Rotation | Β±5Β° rotation | 100% |
| Random Shadow | Simulated shadow overlay | 50% |
| Random Brightness | Brightness variation (0.25-1.25x) | 100% |
| Gaussian Blur | Random kernel size blur | 30% |
| Contrast Adjustment | Random contrast (0.5-1.5x) | 30% |
| Gaussian Noise | Additive noise (ΞΌ=0, Ο=10) | 30% |
| Channel Dropout | Zero-out random color channel | 20% |
A lightweight custom architecture optimized for steering prediction.
Input: 3Γ66Γ200
β
Conv2D(32, 7Γ7) β BatchNorm β ELU β MaxPool
β
Conv2D(64, 5Γ5) β BatchNorm β ELU β MaxPool
β
Conv2D(64, 3Γ3) β BatchNorm β ELU β MaxPool
β
Conv2D(128, 3Γ3) β BatchNorm β ELU β MaxPool
β
Flatten β FC(512) β Dropout(0.5) β ELU
β
FC(124) β Dropout(0.5) β ELU
β
FC(32) β ELU β FC(10) β ELU β FC(1)
β
Output: Steering Angle
Based on NVIDIA's paper "End to End Learning for Self-Driving Cars".
Input: 3Γ66Γ200
β
Conv2D(24, 5Γ5, stride=2) β BatchNorm β LeakyReLU
β
Conv2D(36, 5Γ5, stride=2) β BatchNorm β LeakyReLU
β
Conv2D(48, 5Γ5, stride=2) β BatchNorm β LeakyReLU
β
Conv2D(64, 3Γ3) β BatchNorm β LeakyReLU
β
Conv2D(64, 3Γ3) β BatchNorm β LeakyReLU
β
AdaptiveAvgPool β LayerNorm
β
FC(100) β Dropout(0.4) β FC(50) β Dropout(0.3) β FC(10) β FC(1)
β
Output: Steering Angle
A modern architecture using Swish activation and separable convolutions.
Input: 3Γ66Γ200
β
Conv2D(32, 3Γ3, stride=2) β BatchNorm β Swish
β
SeparableConv(64) β BatchNorm β Swish
β
SeparableConv(128) β BatchNorm β Swish
β
ResidualBlock(128) Γ 2
β
AdaptiveAvgPool(1Γ1) β LayerNorm
β
FC(64) β Swish β Dropout(0.4) β FC(1)
β
Output: Steering Angle
- ResNet18: Pretrained ImageNet backbone with custom regression head
- AlexNet: Pretrained classifier adapted for steering regression
dataset/
βββ IMG/
β βββ center_2024_01_01_00_00_00_000.jpg
β βββ left_2024_01_01_00_00_00_000.jpg
β βββ right_2024_01_01_00_00_00_000.jpg
β βββ ...
βββ dataset.csv
| center | left | right | steering |
|---|---|---|---|
| dataset/IMG/center_*.jpg | dataset/IMG/left_*.jpg | dataset/IMG/right_*.jpg | 0.0 |
| dataset/IMG/center_*.jpg | dataset/IMG/left_*.jpg | dataset/IMG/right_*.jpg | 0.15 |
| ... | ... | ... | ... |
- Python 3.8+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
- Clone the repository
git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git
cd CNN_based_Autonomous_Navigation_training- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Prepare dataset
# Place your dataset in the following structure:
# dataset/IMG/*.jpg
# dataset/dataset.csv- Run training
jupyter notebook simulation_cnn_model_pytorch_3.ipynbbatch_size = 64
samples_per_epoch = 1000
nb_epoch = 40
learning_rate = 1e-4
optimizer = Adam
loss_function = MSELoss- Data Loading: Images are loaded with multi-camera random selection
- Augmentation: Real-time augmentation applied during training
- Forward Pass: Image β CNN β Steering prediction
- Loss Calculation: MSE between predicted and ground truth
- Backpropagation: Gradient descent optimization
- Validation: Model evaluated on held-out test set
- Checkpointing: Best model saved based on validation loss
Models are exported using TorchScript for deployment:
scripted_model = torch.jit.script(model)
scripted_model.save("model.pt")Training produces loss curves showing model convergence. Lower validation loss indicates better generalization.
| Model | Parameters | Best Val Loss | Training Time* |
|---|---|---|---|
| Model 1 | ~1.2M | TBD | ~30 min |
| Model 2 | ~800K | TBD | ~25 min |
| Model 3 | ~500K | TBD | ~20 min |
| ResNet18 | ~11M | TBD | ~45 min |
| AlexNet | ~60M | TBD | ~60 min |
*Training time on NVIDIA GTX 1080 Ti
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA COLLECTION PHASE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β Driving βββββββββΆβ Cameras βββββββββΆβ Recording β β
β β Simulatorβ β (L/C/R) β β Images+CSV β β
β ββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRAINING PHASE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ ββββββββββββββ βββββββββββ ββββββββββββββ β
β β Dataset βββββΆβ DataLoader βββββΆβ Model βββββΆβ Loss β β
β β (CSV) β β + Augment β β CNN β β (MSE Loss) β β
β βββββββββββ ββββββββββββββ βββββββββββ ββββββββββββββ β
β β² β β
β β βΌ β
β βββββββββββ ββββββββββββββ β
β β Updated ββββββ Optimizer β β
β β Weights β β (Adam) β β
β βββββββββββ ββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INFERENCE PHASE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ ββββββββββββββ βββββββββββββ βββββββββββββββββ β
β β Camera βββββΆβ Preprocess βββββΆβ Trained βββββΆβ Steering β β
β β Frame β β Pipeline β β Model β β Command β β
β βββββββββββ ββββββββββββββ βββββββββββββ βββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββ β
β β Vehicle β β
β β Actuator β β
β βββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CNN FEATURE EXTRACTION β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INPUT IMAGE CONVOLUTIONAL LAYERS FULLY CONNECTED β
β βββββββββββ β
β βββββββββββ βββββ βββββ βββββ βββββ βββββββββ βββββ β
β ββββββββββββββΆβC1 ββββΆβC2 ββββΆβC3 ββββΆβC4 ββββΆβFlattenβββββΆβFC ββββΆ ΞΈ β
β βββββββββββ β32 β β64 β β64 β β128β β β β β β
β βββββββββββ βββββ βββββ βββββ βββββ βββββββββ βββββ β
β βββββββββββ β β β β β
β 3Γ66Γ200 BN BN BN BN Dropout + ReLU β
β Pool Pool Pool Pool β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Legend:
C1-C4 = Convolutional Layers
BN = Batch Normalization
Pool = Max Pooling
FC = Fully Connected
ΞΈ = Steering Angle Output
Q: What is end-to-end learning for autonomous driving?
A: End-to-end learning means the neural network learns to map directly from raw sensor inputs (camera images) to control outputs (steering angle) without explicit intermediate representations like lane detection or path planning. The model learns the entire perception-to-action pipeline in a single network.
Q: Why use CNNs for steering prediction?
A: CNNs are excellent at extracting spatial features from images. They can learn hierarchical representations - from edges and textures to complex patterns like road boundaries, lane markings, and environmental cues - that are essential for determining the correct steering angle.
Q: What simulation was used for data collection?
A: This project uses the Udacity Self-Driving Car Simulator, which provides a realistic driving environment with multiple camera views and steering angle recording.
Q: Why crop the image before processing?
A: The top portion of the image contains sky and scenery that doesn't help with steering decisions. The bottom contains the car hood. Cropping removes these irrelevant regions, allowing the model to focus on the road and lane markings.
Q: Why convert RGB to YUV color space?
A: YUV separates luminance (Y) from chrominance (U, V), which can help the model be more robust to lighting variations. This follows NVIDIA's original approach in their self-driving car paper.
Q: Why use MSE loss instead of other loss functions?
A: MSE (Mean Squared Error) is a natural choice for regression problems where we want to minimize the squared difference between predicted and actual steering angles. It penalizes larger errors more heavily, which is desirable for smooth steering predictions.
Q: How do you handle the imbalanced steering angle distribution?
A: The dataset typically has more straight driving (steering β 0) than turns. We address this through:
- Random camera selection (left/center/right) with steering correction
- Data augmentation (flipping, shifting, rotation)
- Not oversampling straight segments
Q: Can I use this model for real-world deployment?
A: This model is trained on simulation data and is intended for research/educational purposes. Real-world deployment would require:
- Training on real-world data
- Extensive safety testing
- Additional sensors (LIDAR, radar)
- Redundancy systems
- Regulatory compliance
Q: How long does training take?
A: Training time depends on your hardware:
- With GPU (GTX 1080 Ti or better): ~20-45 minutes per model
- Without GPU: Several hours per model (not recommended)
Q: What if I get CUDA out of memory errors?
A: Try:
- Reducing
batch_size(e.g., from 64 to 32) - Using a smaller model (Model 3 is most efficient)
- Ensuring no other GPU processes are running
Q: How do I know if my model is overfitting?
A: Watch the training and validation loss curves:
- If training loss keeps decreasing but validation loss starts increasing, you're overfitting
- Use early stopping based on validation loss
- Increase dropout or add more augmentation
We welcome contributions from the community! Whether it's bug fixes, new features, or documentation improvements, your help is appreciated.
- Fork the repository
# Click the 'Fork' button on GitHub- Clone your fork
git clone https://github.com/YOUR_USERNAME/CNN_based_Autonomous_Navigation_training.git- Create a feature branch
git checkout -b feature/your-feature-name- Make your changes
# Edit files, add features, fix bugs- Commit your changes
git add .
git commit -m "Add: Description of your changes"- Push to your fork
git push origin feature/your-feature-name- Create a Pull Request
# Go to GitHub and click 'New Pull Request'- π Bug Fixes: Found a bug? Fix it!
- π Documentation: Improve README, add docstrings, create tutorials
- π§ͺ Testing: Add unit tests, integration tests
- π¨ New Models: Implement additional architectures (EfficientNet, Vision Transformers)
- π Visualization: Add training dashboards, prediction visualizations
- π Performance: Optimize inference speed, reduce memory usage
- π¦ Deployment: Add Docker support, TensorRT conversion
- Follow PEP 8 for Python code
- Use meaningful variable names
- Add docstrings to functions and classes
- Keep functions focused and small
Type: Short description (max 50 chars)
Longer description if needed...
Types: Add, Fix, Update, Remove, Refactor, Document
This project is licensed under the MIT License - see the LICENSE file for details.
- NVIDIA's End-to-End Learning Paper - Original inspiration
- Udacity Self-Driving Car Simulator
- PyTorch team for the excellent deep learning framework
For questions, suggestions, or collaboration opportunities:
- GitHub Issues: Create an Issue
- Pull Requests: Always welcome!
β Star this repository if you found it helpful! β
Made with β€οΈ for the autonomous driving community