In the current implementation, certain subcommands look for preprocessed data in the output directory, which works fine for the dataset that has been processed for the first time in that directory. However, if the users want to use make certain tweaks to the parameters of those subcommands and save the output in a different directory, they have to preprocess the raw data again.
It will be better to provide users an option to provide preprocessed data directory as input, so that they don't have to re-do the step multiple times. This will save time and storage.