Skip to content

Large PPL Fluctuations in VPTQ 4.05-bit Quantization Despite Fixed Random Seed and Identical Setup #202

@Qingtian-Liu

Description

@Qingtian-Liu

Hi, I'm encountering significant fluctuations in perplexity (PPL) when reproducing the VPTQ 4.05-bit quantization results using the following configuration:

"--vector_lens", "-1", "6",
"--group_num", "1",
"--num_centroids", "-1", "4096",
"--num_res_centroids", "-1", "4096",
"--npercent", "0",
"--blocksize", "128",
"--new_eval",
"--seq_len", "2048",
"--kmeans_mode", "hessian",
"--num_gpus", "8",
# "--enable_perm",
"--enable_norm",
"--save_model",
"--save_packed_model",
"--hessian_path", "/workshop/Hessians/H",
"--inv_hessian_path", "/workshop/Hessians/INVH",
"--ktol", "1e-5",
"--kiter", "100"

Setup details:

  • Model: Llama3-8B
  • Dataset: wikitext-2
  • Hardware: 8× A100 GPUs
  • Random seed: default (0)
  • Hessian files are precomputed and reused across runs

Observed behavior:
Across multiple independent runs with the exact same command and environment, I obtained widely varying PPL scores: 29.83, 15.52, and 50.56.

To debug, I verified that:
The inference code itself is deterministic: when I load a saved quantized model and run evaluation, the PPL is consistent across repeated evaluations of the same quantized checkpoint.
However, different quantization runs (even with identical seeds and inputs) produce quantized models with drastically different PPLs.
This suggests that non-determinism is introduced during the quantization process, possibly in the k-means clustering step (--kmeans_mode hessian). Could this be due to:

Non-deterministic behavior in PyTorch/CUDA operations despite a fixed seed?
Initialization sensitivity in k-means when using Hessian-weighted distances?
Race conditions or non-determinism across multi-GPU execution?
Could you please help clarify why such large fluctuations occur and how to achieve reproducible quantization results? Any guidance on ensuring determinism (e.g., additional seeding, disabling certain optimizations, or adjusting k-means parameters) would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions