Repository for the work Exploring Lossy Compression of Activation Data for Emerging AI Accelerators: A Case Study on the Graphcore IPU.
- CUDA 12.4
- PopTorch 3.3
- PyTorch 2.0
- SciML-Bench 1.2.0: https://github.com/stfc-sciml/sciml-bench
- src/
- accuracy_analysis/
- accuracy.py
- Entry point for training models, see file for specific parameters to use
- accuracy.py
- compress_compile/
- models/
- Sample input files for backward pass unroller
- activation_compiler_token_dict.json
- Specification for each supported operator
- backwardpassunroller.py
- Backward pass unroller CLI
- graph.py
- Generates unrolled network code
- nodes.py
- Classes for computational graph nodes
- models/
- compressor/
- gpu/
- GPU compressor files
- ipu/
- IPU compressor files
- gpu/
- operators/
- gpu/
- GPU custom forward+backward pass classes for each operator
- ipu/
- IPU custom forward+backward pass classes for each operator
- gpu/
- operatorsv2/
- Streamlined implementation of forward+backward pass classes for latest version of backward pass unroller
- unrolled/
- Contains unrolled networks for each benchmark
- Both platforms contain unrolled networks for nocompression, dctfirst, dctdense, quant
- gpu/
- modelstructs.py
- Utility functions for model imports and dataloaders
- gpurunner.py
- Main script to run GPU benchmark
- modelstructs.py
- ipu/
- modelstructs.py
- Utility functions for model imports and dataloaders
- ipurunner.py
- Main script to run IPU benchmark
- modelstructs.py
- multiipu/
- Directory to run multi-IPU benchmarks
- Same scripts as above with addition of strategy.py to map computation to IPUs
- utils/
- Additional utility functions for dataloaders
- accuracy_analysis/
You can test out the unroller on a sample network definition file as follows:
cd src/compress_compile
python cli.py --input_file=<path_to_model_definition> --optimization_level=<0,1> --model_name=<model_name> --compress_config=<path_to_compression_configuration> --platform=<"gpu"/"ipu">
This will generate the unrolled network with compress/decompress calls depending on the compress_config.txt file that specifies which tensor activations to compress, which compressor to use, and the corresponding compressor parameters.
Parameters:
--optimization_level: if 0, no compression. If 1, use compression according to compression configuration file.--compress_config: text file specifying compression configuration. Seesrc/compress_compile/compress_config.txtfor an example.--model_name:torch.nn.Modulename and output file name.--input_file: text file of model definition. Seesrc/compress_compile/models/for examples.--platform: For PyTorch GPU operators, use "gpu". For IPU, use "ipu".
These are performance benchmarks testing the time to execute on a batch.
cd src/unrolled/gpu
python gpurunner.py --benchmark=<name> --iterations=<number of runs> --batch_size=<batch size> --get_memory=<get memory stats> --scheme=<nocompress/dct-first/dct-dense/quant>
cd src/unrolled/ipu
python ipurunner.py --benchmark=<name> --iterations=<number of runs> --batch_size=<micro batch size> --device_batch_size=<batch size> --graphcore_profile=<profile run> --scheme=<nocompress/dct-first/dct-dense/quant>
Only dct-dense and nocompress supported in this version.
cd src/unrolled/ipu
python ipurunner.py --benchmark=<name> --iterations=<number of runs> --batch_size=<micro batch size> --device_batch_size=<batch size> --graphcore_profile=<profile run> --scheme=<nocompress/dct-first/dct-dense/quant> --num_ipus=<1,2,4,8,16> --replication_factor=<how many model replicas for data parallelism>