For a VecZnx of $L$ limbs, VmpPmathas size $\mathcal{O}(L^2)$ because it stores the gadget vector $\omega = [2^{-K}, 2^{-2K}, \dots, 2^{-LK}]$, with each $\omega[i]$ stored in a column (i.e. a VecZnx) of at least $L$ limbs.
This setup is equivalent to having the digit in RNS be the individual primes $q_{i}$, i.e. with dnum=L and dsize=1.
Although the time complexity of the vector-matrix-product is linear in $L$ in the Base2K representation, it still results quickly in prohibitively large keys when the parameters increase.
For example, with $N=2^{16}$ and $L=30$ size of a single key is in the order of $2^{16} \cdot (30-dnum)/dnum \cdot 30 \cdot 2 \approx 2^{26.7}$ coefficients, which is 0.870GB for FFT64 and 3.48GB for NTT120.
Similar to what is done in RNS, we have the relation $|\omega| = \lceil\textsf{dnum}/\textsf{dsize}\rceil$, thus we can increase dsize to reduce the number of elements in $\omega$. For example with dsize = 4 we have $\omega = [2^{-4K}, 2^{-8K}, \dots, 2^{-LK}]$. This however adds noise proportional to $2^{4K}$ instead of $2^{K}$, so the number of limbs of the VmpPmat needs to be slightly increased (or the homomorphic capacity slightly decreased).
Still this provides a noticeable reduction in the memory footprint of the keys. For example, with the previous parameters but with dnum=6 the size changes to $2^{16} \cdot (30-dnum)/dnum \cdot 30 \cdot 2 \approx 2^{23.9}$ coefficients, which is 120MB for FFT64 and 480MB for NTT120, in other words a 6 fold decrease in the key size, at the cost of 6 limbs of homomorphic capacity.
The question is how to implement this feature in the library.