EMA of VQGAN+ embeddings

Hi, I saw in the original VQGAN paper and many other implementations that the embeddings of a traditional vector quantizer are updated using EMA during training. However, in your code, I only see a separate EMAmodel, while the embeddings being used to compute the loss is still updated using gradient descent. I would be happy if you could share some insights on your design choices!