About hard coded problems

Hi authors, thanks for the great work!

I noticed that several key dimensions (e.g., video height/width, latent shape `[21, 16, 60, 104]`) are currently hard-coded in the codebase. Could the authors provide a version that is easier for further deverlopment? This would greatly improve extensibility for research and downstream applications.