Document Overview: 📌 Project Overview | 🎥 Demo | 🚀 How to Run | 📄 Report
Simulating and optimizing a dual-conveyor assembly line using Reinforcement Learning (PPO). The goal is to maximize throughput while minimizing energy consumption and mechanical wear.
The following GIFs demonstrate the agent's learning progress throughout the training process.
| Step 100k (Early Stage) | Step 300k (Learning) |
|---|---|
![]() |
![]() |
| Agent struggles to pick items; random movements. | Agent learns to pick but timing is inconsistent. |
| Step 500k (Converging) | Step 1M (Final Policy) |
|---|---|
![]() |
![]() |
| Smooth operation; bottleneck is minimized. | Optimal acceleration control & zero failure rate. |
-
Install dependencies:
pip install -r requirements.txt
-
Train the agent (Optional):
python train.py
-
Analyze training results: Generate the learning curve and policy behavior graphs (saved as PNG images).
python plot.py
-
Visualize the simulation result in GIF: Render the assembly line animation based on the trained model.
python render.py
If you wish to retrain the agent using train.py, please ensure your hardware supports the parallel environment settings.
- Default Setting: The script is configured to run 16 parallel environments (
N_ENVS=16) for faster training. - Adjustment: If you have fewer CPU cores, please lower the
N_ENVSparameter intrain.py(e.g., set to 4 or 8) to prevent system instability.
The detailed problem formulation, mathematical derivation, and simulation result analysis can be found in the project report:
👉 View Report (Public Version)
(Note: Personal contact details have been redacted in this public version for privacy. For the full version, please contact me directly.)



