Using the run.py with standard parameters It takes roughly 20-30min before my GPU starts working, before that only 4 CPU cores have constant spikes. And judging by the timing of the logs It seems that a lot processing is done before the actual training starts... Could you provide any insight into what is happening and if there is something that can be done to optimize? I run this on a 12C/24T CPU @4,7 GhZ and an RTX4090 (cuda enabled, and it starts using the GPU when the training starts).