Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Amazon DocumentDB Compression Review Tool

The compression review tool provides a compressibility metric (the maximum compression possible on each collection's documents) by sampling the actual documents and applying the requested compression algorithm. The actual compression achieved will be a lower number due to fragmentation and write amplification.

The tool samples 1000 documents in each collection to determine the compressibility of the data. A larger number of documents can be sampled via the --sample-size parameter.

Requirements

  • Python 3.7+
  • pymongo Python package - tested versions
    • MongoDB 2.6 - 3.4 | pymongo 3.10 - 3.12
    • MongoDB 3.6 - 5.0 | pymongo 3.12 - 4.0
    • MongoDB 5.1+ | pymongo 4.0+
    • DocumentDB | pymongo 3.10+
    • If not installed - "$ pip3 install pymongo"
  • lz4 Python package
    • If not installed - "$ pip3 install lz4"
  • zstandard Python package
    • If not installed - "$ pip3 install zstandard"

Using the Compression Review Tool

python3 compression-review.py --uri <server-uri> --server-alias <server-alias>

  • Default compressions tested is lz4/fast/level1 and zstandard/level3/4K/Dictionary
  • To test other compression techniques provide --compressor <compression-type>
  • Run on any instance in the replica set
  • Use a different <server-alias> for each server analyzed, output file is named using <server-alias> as the starting portion
  • Creates a single CSV file per execution (so default creates two)
  • The <server-uri> options can be found at https://www.mongodb.com/docs/manual/reference/connection-string/
    • If your URI contains ampersand (&) characters they must be escaped with the backslash or enclosed your URI in double quotes
  • For DocumentDB use either the cluster endpoint or any of the instance endpoints

License

This tool is licensed under the Apache 2.0 License.