Dear developers, I wonder have you tried apply AQLM for VQ in output_dim? Namely quantized in different dimensions rather than along input dimension. I noticed that your code allows configuring out_group_size. Thereby, I try out the configuration below. However, this config leads to bad PPL for Llama-2-7b-hf: wikitext 40.2, c4 70.0. Do you have any idea about it?
--num_codebooks=2
--nbits_per_codebook=8
--out_group_size=8
--in_group_size=1 \