# Preprocessor

The main task of the preprocess utility is to prepare datasets such that they can be used by the different systems. It therefore primarily takes parameters from the [workload definition](workload-definition.md) and applies them to the given datasets. Internally, the utility uses the chain of responsibility pattern to parse the datasets.

## Using the Preprocess Utility
The preprocess utility can be called using the following command:

```bash
python preprocess.py --system {system} \
	--vector_path {vector_dir} \
 	--vector_target_suffix {vector_target_format} \
 	--vector_output_folder {vector_output_folder}  \
 	--vector_target_crs {vector_target_crs}  \
 	--vectorization_type {vectorize_type}  \
 	--raster_path {raster_dir}  \
 	--raster_target_suffix {raster_target_format}  \
 	--raster_output_folder {raster_output_folder}  \
 	--raster_target_crs {raster_target_crs}
```


## Building the Preprocess Utility

The preprocess utility is packaged into a separate docker container. It can be built using the following command:

```bash
docker build . --target=preprocess -t preprocess
```

**Note: If a new version of the preprocessor shall be used, the reference to the container needs to be updated in each `preprocess.sh` script, which can be found at `hub/deployment/files/**/preprocess.sh`.** If RaVeN is used in its dockerized version, the version also has to be updated there.