Some detailed issues

The transformation matrix W is 128x128 in your paper, but the latent code of image(the size of image is 3x128x128) obtained by flow model is multi-scale(6x64x64; 12x32x32; 24x16x16; 96x8x8). How do they multiply together？It may be naive to ask, but it's important to me. Thanks.