Skip to content

Questions regarding implementation details of latent decoder, blend and training resolution #3

@leaf1170124460

Description

@leaf1170124460

Hi, @lightChaserX.

Thank you for your incredible work on this project. The paper is truly impressive and has provided a lot of insights.

I have a few questions regarding some implementation details:

Q1. In section 3.3 of the paper, it's mentioned that the cropped patch predictions are merged in the latent space. Is the latent decoder used from the Stable Diffusion 1.5 pre-trained model? Does it support high-resolution decoding, considering the original P3M images are mostly 1080p? If you've designed your own decoder, could you please share some details about its implementation?

Q2. Regarding the blend operation in Figure 3 of the paper, could you clarify the method used? It wasn't specified in the paper or the supplementary materials.

Q3. The supplementary materials mention that all images are randomly cropped to 256 x 256. Why not use Stable Diffusion 1.5's default training resolution of 512 x 512? What advantages do smaller resolution patches have over high-resolution ones?

I appreciate your time and look forward to your response.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions