Hi authors, thanks for the great work!
I noticed that several key dimensions (e.g., video height/width, latent shape [21, 16, 60, 104]) are currently hard-coded in the codebase. Could the authors provide a version that is easier for further deverlopment? This would greatly improve extensibility for research and downstream applications.