This project attempts to take video recorded on dash-cams, when the vehicle stops moving a prediction is made if the vehicle will start moving again within 8 seconds. Eight seconds is the time at which it becomes a fuel saving to stop the engine from running, any stop less than 8 seconds, the extra fuel required to start the engine will be more than the fuel saved by stopping the engine.
The intention is to save fuel in vehicles with internal combustion engines by only activating any start-stop/Intelligence Stop & Go (ISG) technology if a genuine fuel saving will be made.
This readme takes you through all the steps to run the project, alternatively you can jump straight to the last seconds to run the results and see the visualisations.
The following is required to run this project
- A machine running Linux (target machine used was running OpenSuSE 15.3)
- At least 32GB RAM (more is better as training will occur faster, target machine had 64GB)
- At least 4 Core CPU (target machine was AMD Ryzen 5 3600X 6Cores/12Threads)
- Nvidia GPU with at least 12GB GDDR RAM (target machine used 3080Ti)
- Anaconda 3 (with the following modules installed)
- ffmpeg-python
- numpy
- opencv
- pandas
- pillow
- psycopg2
- pytorch
- pytorch-lightning
- pytorch-mutex
- pytorchvideo
- scipy
- seaborn
- tensorboard
- tensorboard-data-server
- tensorboard-plugin-wit
- torch
- torchmetrics
- torchvision
- ffmpeg (version 4.4 with ffprobe)
- openCV (version 4.5.5, note a breaking change has been introduced in newer versions of open CV)
- Nvidia graphics card drives for your selected GPU
- Nvidia CUDA 11.4
- PostgreSQL (any supported version)
You will need to prime your postgres database with some of the relevant data, you can either:
- Create an empty database and generate your own data
- Prime the database with the existing data used within the project
To create an empty database run the following SQL files into your DB:
- sql/create_tables.sql
- sql/create_bdd_train.sql
- sql/insert_carla.sql
You will now have a database primed with the data for the CARLA simulation and the list of BDD videos.
To import the existing data run the following SQL file into your DB:
- sql/pg_dump.sql
You will need to ensure you have downloads the relevant carla-original files that go with this.
Processes that update the database are able to run on multiple machines at the same time, this will allow you to increase the amount of data you process by running the same process on multiple machines.
You will need to write a configuration file which has the details of where you want your temp data to be kept (this can be big and will be used extensively during training) and also the location of your database. The default name of the config file is cfg/kastria-local.json (which is the name of the machine used for most training work). You can edit the checked in file to fit your own setup.
This process will need access to the raw BDD videos and the postgres database, it will lock one record on the database, process it then post its results back to the database. This means you can run multiple copies of this process on different machines to improve the throughput.
WARNING: This process is VERY slow
Run the following command:
python src/meta-gen-bdd.py
The process is already multi-threaded and will use multiple cores on a single machine, only a very limited performance gain will be achieved by running multiple copies on the same machine (unless your machine has a very high core count).
There are two parts to this process:
- Running the CARLA Simulator (this is best run on a Windows 10 machine).
- Running the CARLA controller to run and record the simulation
As with extracting the stop times this can be run on as many machines concurrently as you have, however you are not able to run multiple simulations concurrently within a single CARLA simulator, you must have multiple simulators to run concurrent simulations.
Run the following command:
python src/carla-gen.py
The following parameters can be added
--config <path-to-config-file>
--carla <carla-server>
For example
python src/carla-gen.py --config cfg/my-config-file.json --carla carla-desktop-1.my.fully.qualified.domain
Video files will be written to a carla-orig directory within the temp directory defined in your configuration. The stop times will be written to the carla_stop table within the PostgreSQL database. All the video files will need to be combined together for further processing (note that the file names will be based on the unique IDs generated by the database and therefore two nodes processing data concurrently will not generate conflicting files).
There are two parts to this process:
- Extracting real world data from BDD
- Extracting synthetic data from the CARLA videos
There are two different processes to do this, but they both use the same functions inside DashcamMomentTracker to extract the training data. Data will be extracted into:
- <TYPE>-still - single still at the moment of the stop
- <TYPE>-multi-still - Single image using three different colour channels to represent moment of stop, 2 seconds before stop, 4 seconds before stop
- <TYPE>-video - Six second video preceding the stop
<TYPE> will be replaced by the type (eg BDD or CARLA). The format of the file names is the same for all videos which is:
- <IDENTIFIER>-<STOP_TIME>.mp4 - for video files
- <IDENTIFIER>-<STOP_TIME>/<INDEX>.jpeg - for image files
CARLA and BDD use different identifiers, CARLA uses the stop ID which the database allocated, BDD the file name of the video. INDEX's for images are all the same, with 19 being the frame at the moment of the stop 18 one frame before, 17 2 frames before etc.
Therefore the following filenames can be taken as examples:
- bdd-video/02bb67ae-8c3d61f8.mov-14363.0.mp4 - BDD video file 02bb67ae-8c3d61f8.mov where the stop is 14363.0 milliseconds into the video
- bdd-multi-still/02cdc06d-5502f174.mov-22542.0/19.jpeg - BDD image from 02cdc06d-5502f174.mov for a stop that occurred 22542.0 milliseconds into the video at the moment the stop occurred
- carla-still/988-54.234/9.jpeg - CARLA image for stop ID 988 which occurred 54.234 seconds into the video, 10 frames back from the moment of stop.
NOTE: CARLA videos are timed in seconds and BDD videos are times in milliseconds, this was an accidental oversight and could be corrected in the future.
BDD Data is extracting using the following python script:
src/bdd-extract.py
There are a number of parameters that you can provide to the script to control how the extract works
--config <config-file> - Defaults to cfg/kastria-local.json (optional)
--perform-extract - Extracts data from the source videos
--process-all - If this is set then the videos will always be processed, if not set, it will only process missing ones (optional)
--dense-optical-flow - Extract the dense optical flow files (optional)
--sparse-optical-flow - Extract the sparse optical flow files (optional)
--train-sparse-optical-flow - Train a classifier for long/short stops using the sparse optical flow files (optional)
--train-dense-optical-flow - Train a classifier for long/short stops using the dense optical flow files (optional)
--arch - Either [resnet50 or densenet121] (default is densenet121). Architecture to use for the sparse/dense optical flow models (optional)
Data is extracted to the tempDir listed in the configuration file
CARLA data is extracted (from the raw generated videos) using the following python script (note this script is also used for training synthetic data and combined models):
src/carla-extract.py
There are a number of parameters that you can provide to the script to control how the extract works:
--perform-extract - Set this to run the extract
--dense-optical-flow - Set to extract the dense optical flow data (optional)
--sparse-optical-flow - Set to extract the sparse optical flow data (optional)
--perform-carla-mods - Perform random augmentations on the images before outputting (blur / lighting / contrast) (optional)
--perform-stop-start-extract - Extract the frames which are stationary / moving for training stationary / moving models (optional)
NOTE: There are further parameters for carla-extract.py which are used for training models, these are explained in the training sections.
Regression models are training using four different python scripts:
- src/stop-time-trainer-stills.py - Trains using the single still at the moment of stop using Resnet50 / Densenet121 / EfficientNet B7
- src/stop-time-trainer-multi-stills.py - Trains using the three channel still at the moment of stop using Resnet50 / Densenet121 / EfficientNet B7
- src/stop-time-trainer-video.py - Trains a video model using a 4D based resnet (without pre-trained weights)
- src/stop-time-trainer-video-pre-trained.py - Trains using extracted videos using Slowfast with pre-trained weights.
For the scripts which can train with different architectures, slight changes to the code are required to switch between the architectures. For example:
#BATCH_SIZE = 16 #Resnet50
#BATCH_SIZE = 12 #Densenet121
BATCH_SIZE = 3 #EfficientNetB7
# Resnet 50
#self.model = models.resnet50(pretrained=True)
#self.model.fc = nn.Linear(in_features=2048, out_features=1)
#Densenet121
#self.model = models.densenet121(pretrained=True)
#self.model.classifier = nn.Linear(in_features=1024, out_features=1)
#EfficientNetB7
self.model = models.efficientnet_b7(pretrained=True)
self.model.classifier[1] = nn.Linear(in_features=2560, out_features=1)
This code is setup to use the EfficientNetB7 architecture. To change this to use Resnet50, the following changes should be made:
BATCH_SIZE = 16 #Resnet50
#BATCH_SIZE = 12 #Densenet121
#BATCH_SIZE = 3 #EfficientNetB7
# Resnet 50
self.model = models.resnet50(pretrained=True)
self.model.fc = nn.Linear(in_features=2048, out_features=1)
#Densenet121
#self.model = models.densenet121(pretrained=True)
#self.model.classifier = nn.Linear(in_features=1024, out_features=1)
#EfficientNetB7
#self.model = models.efficientnet_b7(pretrained=True)
#self.model.classifier[1] = nn.Linear(in_features=2560, out_features=1)
The following arguments required by pytorch lightning should be added:
--accelerator <device> - Normally set to 'gpu' when training on a GPU
--devices <device-count> - Normally set to 1 (though can be set to a higher number if multiple training devices are available)
--max_epochs <epoch-count> - Set to 150 for our experiments to limit the number of epochs. (optional)
Classification models are essentially the same as the regression models with the exception that they output two classes rather than a continuous output. Like the regression models there are 4 different scripts:
- src/stop-time-trainer-stills-classifier.py - Trains using the single still at the moment of stop using Resnet50 / Densenet121 / EfficientNet B7
- src/stop-time-trainer-multi-stills-classifier.py - Trains using the three channel still at the moment of stop using Resnet50 / Densenet121 / EfficientNet B7
- src/stop-time-trainer-video-classifier.py - Trains a video model using a 4D based resnet (without pre-trained weights)
- src/stop-time-trainer-video-pre-trained-classifier.py - Trains using extracted videos using Slowfast with pre-trained weights.
The same notes apply regarding changes to the code as per the regression models.
The carla-extract script is used for training these models, it is suggested that the extract is completed before running any training (though you can run the extract and train at the same time). However, when training with BDD data the data must have been extracted first. The script src/carla-extract.py should be run with the following arguments:
--accelerator <device> - Normally set to 'gpu' when training on a GPU
--devices <device-count> - Normally set to 1 (though can be set to a higher number if multiple training devices are available)
--max_epochs <epoch-count> - Set to 150 for our experiments to limit the number of epochs. (optional)
--arch <arch> - Network architecture to use, one of resnet50, densenet121, defaults to densenet121 (optional)
--single-frame-train - Trains based on a single still extracted at the moment of stop (plus the 19 previous stills) (optional)
--multi-frame-train - Trains based on the 3 channel multi still (plus the 19 previous stills) (optional)
--start-stop-train - Trains a network to classify stills into moving and not moving (optional)
--video-train - Trains a network based on videos (this will use the SlowFast architecture) (optional)
--use-bdd-and-carla - Indicates that the BDD and data should be combined (use with --carla and --bdd) (optional)
--carla <percentage> - A number between 0 and 1 to indicate how much CARLA data to train with (0 = nothing, 1 = all) (optional)
--bdd <percentage> - A number between 0 and 1 to indicate how much BDD data to train with (0 = nothing, 1 = all) (optional)
--oversample-training - Switch on the oversampling of the training set (optional)
--oversample-validation - Switch on the oversampling of the validation set (optional)
Two dedicated scripts are used for testing models:
- src/bdd-test.py - This tests a model against the validation set and outputs classification results and optionally GradCAM / ScoreCAM
- src/bdd-valid-info.py - This takes in the CSV of results (which is produced by bdd-test.py) and settings of your target vehicle, from this is can calculate the fuel used as a result of your model.
This step generates the accuracy of the model and tests it against the test set. The output from this can then be optionally fed into the second script to calculate the fuel usage. The following python script will need to be run:
src/bdd-test.py
The following arguments can be provided:
--model <file-name> - The model file to test
--config <config-file> - The config file to use, defaults to cfg/kastria-local.json (optional)
--regression - Should test a regression model rather than a classification model (defaults to classification) (optional)
--images <image-type> - One of: [still, multi-still, video] depending on what type of images you want to test with
--arch <arch> - One of [resnet50, densenet121, efficientnet_b7, slowfast] (slowfast can only be used with --images video), the architecture of the model
--csv - Output the results in CSV format rather than in human readable format (default is human readable) (optional)
--cam <cam-type> - One of [GradCAM, ScoreCAM, SmoothGradCAMpp, GradCAMpp] Generates CAM images (only GradCAM and ScoreCAM have been tested). NOTE a bug exists with GradCAM in the library https://github.com/yiskw713/ScoreCAM, GradCAM has a memory leak and will crash when memory runs out. (optional)
Example (assuming Linux style command line):
python src/bdd-test.py --model models/my-model.ckpt --images still --csv --cam ScoreCAM > results.csv
This will write the results to CSV file called results.csv. Once the results have been generated they should be saved as a CSV should you want to use the next process.
This step uses the output from step 1 to calculate the fuel usage:
src/bdd-valid-info.py
The following arguments can be provided:
--config <config-file> - The config file to use, detauls to cfg/kastria-local.json (optional)
--idle-fuel-use <ml-per-second> - The amount of fuel (ml) your target vehicle burns per second when idling
--start-fuel-use <ml> - The amount of fuel used in starting the engine
--results-csv <file-name> - The name of a CSV file with the results of your test (optional)
--stop <stop-type> - One of [always, never, results] Assume the engine always stops, never stops or stops based on the decision in the results file
Pre-built docker images are provided to view the output of the three most successful models against the test set. These contain all the models, code and data to run the test. The docker image runs all inference on the CPU to help simulate the lack of dedicated machine learning hardware which is a likly condition on running in a vehicle.
While the scripts are checked-in to git, it is not possible to build these directly from source as the data needs to be included. Data can be downloaded from:
Download the two files in the docker directory and place these in a "data" directory inside the project_coursework directory. Docker images can then be built with (from the project_coursework directory):
docker build -f docker/results/Dockerfile .
This will use an ubuntu 22.04 base image then:
- add the binary dependencies
- add the python dependencies
- copy test data
- copy trained models from source control models
- copy code from the src directory
- install a wrapper script
- set the run command to run the wrapper script
The pre-built docker image can be pulled using the following command:
docker pull s5324494/project_coursework:latest
This was built using the process mentioned above, but is deployed on docker hub for ease of use.
However the docker image was obtained, it can be run without arguments or environment variables with:
docker run <image-id>
for example
docker run s5324494/project_coursework:latest
This will run the most effective model, output the results for each record in the test set, then output the fuel consumption based on the Gas Compact Sedan (assuming the model is used). As a reference point the fuel consumption for always and never stopping will be output too.
The following environment variables can be used to change the model, change the fuel consumption parameters and to output CAM images:
MODEL=[1|2|3] - Used the best, second best or third best model
CAM=[ScoreCAM|GradCAM|SmoothGradCAMpp|GradCAMpp] - Output CAM images (default disabled)
FUEL_ML_PER_SEC=<VALUE> - The fuel burned per seconds in ml for an engine which is idling (defaults to 0.2020)
FUEL_START=<VALUE> - The fuel burned to start the engine in ml (defaults to 1.6160)
To retrieve the CAM images (or the results CSV) a docker volume is required and should be mounted into:
/mnt/results
The command below can be used as an example which will:
- Use the 3rd model
- Produce ScoreCAM images
- Change the fuel used when idling to 1.2249 ml/sec
- Change the fuel used to start the engine to 9.7992 ml
- Save the results CSV / ScoreCAM images to /mnt/some-dir-on-my-computer
Command:
docker run -e MODEL=3 -e CAM=ScoreCAM -e FUEL_ML_PER_SEC=1.2249 -e FUEL_START=9.7992 -v /mnt/some-dir-on-my-computer:/mnt/results s5324494/project_coursework:latest
WARNING: Generating the CAM images can take a considerable amount of time as these are generated on the CPU
WARNING: The CAM library (https://github.com/yiskw713/ScoreCAM) has a memory leak which limits the number of GradCAM images that can be generated, the script will eventually crash when generating GradCAM images.
WARNING: SmoothGradCAMpp and GradCAMpp are included because the library (https://github.com/yiskw713/ScoreCAM) supports them out of the box, their use has not been tested.