-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Hello, dear author. Could you please update the code in the algorithm branch?
Currently, when using a model quantized with the algorithm branch and trying to load it with the latest version of VPTQ, an error occurs. It seems that the model is not compatible with the latest VPTQ code. The error is similar to:
INFO 05-22 02:56:39 [importing.py:53] Triton module has been replaced with a placeholder. INFO 05-22 02:56:39 [__init__.py:239] Automatically detected platform cuda. Successfully loaded VPTQ CUDA kernels. Replacing linear layers...: 1%|█▊ | 6/423 [00:00<00:00, 100663.30it/s] Traceback (most recent call last): File "/dataST/users/gexinning/llm_test/evlu_vptq.py", line 50, in <module> model = VQAutoModelQuantization.from_pretrained( File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/vptq/layers/model_base.py", line 122, in from_pretrained make_quant_linear( File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/vptq/layers/model_base.py", line 46, in make_quant_linear new_module = target_layer( TypeError: vptq.layers.vqlinear.VQuantLinear() got multiple values for keyword argument 'enable_proxy_error'
and
INFO 05-21 14:19:21 [importing.py:53] Triton module has been replaced with a placeholder. INFO 05-21 14:19:21 [__init__.py:239] Automatically detected platform cuda. Successfully loaded VPTQ CUDA kernels. Replacing linear layers...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 215/215 [00:00<00:00, 1789.69it/s] 2025-05-21 14:19:24,207 - accelerate.utils.modeling - WARNING - The model weights are not tied. Please use the tie_weightsmethod before using theinfer_auto_devicefunction. 2025-05-21 14:19:24,211 - accelerate.utils.modeling - WARNING - The model weights are not tied. Please use thetie_weightsmethod before using theinfer_auto_device function. Traceback (most recent call last): File "/dataST/users/gexinning/llm_test/evlu_vptq.py", line 50, in <module> model = VQAutoModelQuantization.from_pretrained( File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/vptq/layers/model_base.py", line 185, in from_pretrained model = accelerate.load_checkpoint_and_dispatch( File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/accelerate/big_modeling.py", line 642, in load_checkpoint_and_dispatch return dispatch_model( File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/accelerate/big_modeling.py", line 502, in dispatch_model model.to(device) File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3698, in to return super().to(*args, **kwargs) File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1343, in to return self._apply(convert) File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 903, in _apply module._apply(fn) File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 930, in _apply param_applied = fn(param) File "/dataST/users/gexinning/.conda/envs/ptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1336, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.