The Electric Vehicle Routing Problem (EVRP) is a complex combinatorial optimization challenge central to logistics and urban transportation. It involves routing a fleet of battery-constrained electric vehicles (EVs) to serve customers within specified time windows while minimizing travel distance and respecting charging constraints. This project tackles EVRP using Deep Reinforcement Learning (DRL), inspired by the pioneering LIN202 framework, to develop adaptive and scalable routing solutions. Conducted as part of an academic internship focused on reinforcement learning approaches, this research aims to push the boundaries of intelligent routing under realistic EV operational constraints.
- Model the EVRP as a reinforcement learning environment reflecting battery limitations, time windows, and charging station availability.
- Develop, implement, and train RL agents (e.g., DQN, PPO, policy gradient methods) to generate feasible and energy-efficient routing policies.
- Benchmark RL models against classical optimization approaches such as heuristics and mixed-integer linear programming.
- Evaluate model scalability, route efficiency, and charging optimization across diverse problem instances.
The approach represents the routing problem as a graph, encompassing customers, charging stations, and depot nodes. Each node encodes local data (coordinates, customer demand, time windows) and global system states (current time, battery level, available EVs). The core RL model uses a graph embedding technique (Structure2Vec) combined with an attention mechanism and an LSTM decoder to estimate action probabilities for route construction. The training process employs policy gradient optimization with a rollout baseline, guided by a reward function balancing route distance minimization, constraint satisfaction, and penalty terms for infeasibilities.
π¦ evrp-rl
β£ π src/ # RL models, environment, and utility code
β£ π data/ # Benchmark datasets and synthetic instance generators
β£ π experiments/ # Experiment scripts and Jupyter notebooks
β£ π results/ # Model outputs, route visualizations, and metrics
β£ π docs/ # Literature reviews, reports, and academic papers
β£ README.md
β£ LICENSE
β CODE_OF_CONDUCT.md
See docs/ARCHITECTURE_DIAGRAM.md for a Mermaid source, ASCII fallback, sequence diagram, data shapes, and extension points.
- Python 3.10 or higher
- TensorFlow or PyTorch
- NumPy
- Pandas
- Matplotlib
git clone https://github.com/sdley/evrp-rl.git
cd evrp-rl
pip install -r requirements.txtThe project now includes a modular RL framework for running configurable experiments:
# Using YAML configuration
from src.framework import ConfigLoader, run_experiment
config = ConfigLoader.load('configs/experiment_config.yaml')
run_experiment(config)Or programmatically:
from src.framework import create_experiment_config, run_experiment
config = create_experiment_config(
env_config={'num_customers': 20, 'num_chargers': 5},
agent_config={'type': 'sac', 'encoder': {'type': 'gat'}},
run_config={'epochs': 100, 'name': 'my_experiment'}
)
run_experiment(config)See src/framework/README.md for complete documentation.
If you want a single script that runs a complete end-to-end experiment (parses YAML, initializes modular components, runs training on synthetic benchmarks, evaluates on held-out scenarios, plots convergence, and performs greedy inference), use the prompt below as a concise spec for a scripts/train_full.py implementation or for README examples:
"Complete end-to-end train script: parse YAML, init modular components, train on synthetic benchmarks (gen 1M instances like Node20/5). Eval on held-out, plot convergence. Inference: greedy decode (argmax no sample). Reference Reinforce-paper end-to-end GAT+attention decoder."
A scaffold script is available at scripts/train_full.py which demonstrates a safe, short demo run and can be scaled by setting num_train_episodes / num_eval_episodes in your YAML or using CLI flags.
To train the RL agent using legacy scripts:
python src/train_agent.pyOther scripts and notebooks for evaluation and visualization are available in the experiments/ directory.
The modular framework (src/framework/) provides:
- β¨ Config-Driven: Define experiments in YAML or Python
- π Factory Pattern: Create environments, encoders, and agents from configs
- π― Multiple Algorithms: A2C and SAC agents
- π§ Flexible Encoders: MLP and GAT (Graph Attention Network) encoders
- π Reward Shaping: Custom penalties and bonuses
- π Action Masking: Battery-aware and cargo-aware constraints
- π Metrics Logging: Comprehensive tracking and visualization
- πΎ Checkpointing: Automatic model saving and best model tracking
- π§ͺ Well-Tested: 32 unit tests covering all components
Quick Example: See examples/ablation_study.ipynb for a complete ablation study.
-
Environment Implementation:
- EVRPEnvironment with battery and cargo constraints
- Charging station and depot mechanics
- Action masking for valid moves
-
Agent Implementations:
- A2C (Advantage Actor-Critic) with stable training
- SAC (Soft Actor-Critic) with automatic entropy tuning
- Fixed NaN gradient issues (see docs/NAN_GRADIENT_FIX.md)
-
Encoder Architectures:
- MLP encoder (baseline)
- GAT encoder (graph attention for spatial relationships)
-
Modular Framework:
- Factory classes for env/encoder/agent creation
- Reward shaping and action masking modules
- Experiment runner with training/evaluation loops
- Metrics logging and visualization
- Comprehensive tests (32/32 passing)
- Hyperparameter tuning and optimization
- Additional encoder architectures (Transformer)
- More RL algorithms (PPO, DQN)
- Execute experiments with configurable environment parameters, such as the number of EVs, charging stations, and customer time windows.
- Visualize optimized routes and performance metrics with provided scripts and interactive Jupyter notebooks.
- Modify environment constraints or model architecture for tailored EVRP variants.
- RL models demonstrate robust performance on large-scale EVRP instances where classical methods falter.
- Stochastic decoding strategies yield high-quality routing solutions with significant efficiency gains.
- Visual route examples highlight the modelβs ability to balance time windows, battery constraints, and charging station visits.
- Benchmarks indicate superior scalability and adaptability suitable for real-time EV fleet operations.
Contributions, suggestions, and improvements are warmly welcomed. Please ensure adherence to the repositoryβs Code of Conduct. Feel free to open issues or submit pull requests for collaboration.
This project is licensed under the MIT License β see the LICENSE file for details.
- Authors of the paper "Deep Reinforcement Learning for the Electric Vehicle Routing Problem With Time Windows" (LIN202).
- Supporting academic institution and research lab.