This is a library and script for adding space-wasting padding tensor data to safetensors files so that the start of each genuine tensor dictionary entry is block-aligned to a 4k filesystem block. The motivation here is to improve storage performance for block-level deduplication for diffusion models.
Adds 4k block-alignment padding to existing .safetensors files.
# Single file
align_safetensors input/clip_l.safetensors output/clip_l.safetensors # destination filename
align_safetensors input/clip_l.safetensors output/ # destination directory
# Will process subdirectories recursively and preserve the directory structure in the output/ directory.
align_safetensors input/ output/ # destination must be a directory
# By default will not overwrite existing files
align_safetensors input/ output/ --overwriteThe input can be either a single file or a directory tree. If the input is a directory, the output path must also be a directory, and the tool will recursively process all .safetensors files while preserving the directory structure.
Converts PyTorch checkpoint formats (.pt, .pth, .ckpt, .bin) to block-aligned .safetensors files. Has the same argument structure as align_safetensors.
Note: This tool is incomplete and may not support textual inversions or pickled objects that aren't torch.Tensor instances.
I am cowardly refusing to process input files unless you acknowledge the security risks of loading pickled PyTorch files by providing the --yeah-yeah flag.
When Python loads pickled files, it may execute arbitrary code. This is inherently unsafe and you should only process them inside a safe environment (like a container) unless you completely trust the source.