Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions operations/document-compression-updater/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# Python Updater tool
This sample applications compresses pre-existing documents in an existing collection after compression is turned on that existing collection.
# Python Updater tool
This sample application compresses pre-existing documents in an existing collection after compression is turned on that existing collection.

Single threaded application - issues **5000** (controlled by argument --batch-size) updates serially in a _round_, and sleeps for **60** (controlled by argument --wait-period) seconds before starting next _round_.

Status of the updates are maintained in database **tracker_db** - for each collection there is a tracker collection named **<< collection >>__tracker_col**.
After each batch, the temporary dummy field used to trigger compression is automatically removed from all updated documents. Use `--skip-cleanup` to disable this behaviour.

The application can be restarted if it crashes and it will pick up from last successful _round_ based on data in **<< collection >>__tracker_col**.
Status of the updates are maintained in database **tracker_db** - for each collection there is a tracker collection named **<< collection >>__tracker_col**. Each tracker entry includes a `cleanupComplete` flag indicating whether the dummy field was removed for that batch.

The update statements use field **6nh63** (controlled by argument --update-field), for triggering compression on existing records.
The application can be restarted if it crashes and it will pick up from last successful _round_ based on data in **<< collection >>__tracker_col**. On successful completion the tracker collection is automatically dropped, as it is no longer needed.

The update statements use field **6nh63** (controlled by argument --update-field), for triggering compression on existing records. This field is removed from each document after compression is applied unless `--skip-cleanup` is set.

The application uses **_id** field for tracking and updating existing documents. If you are using a custom value _id, the value should be sort-able.

Expand All @@ -24,7 +26,7 @@ cd amazon-documentdb-tools/operations/document-compression-updater
## Usage/Examples

```
python3 update_apply_compression.py --uri "<<documentdb_uri>>" --database <<database>> --collection <<collection>> --update-field << field_name >> --wait-period << int >>> --batch-size << int >>
python3 update_apply_compression.py --uri "<<documentdb_uri>>" --database <<database>> --collection <<collection>> --update-field << field_name >> --wait-period << int >> --batch-size << int >>
```

The application has the following arguments:
Expand All @@ -40,4 +42,6 @@ Optional parameters
--update-field Field used for updating an existing document. This should not conflict with any fieldname you are already using
--wait-period Number of seconds to wait between each batch
--batch-size Number of documents to update in a single batch
--append-log Append to existing log file instead of overwriting it on startup
--skip-cleanup Skip removing the dummy field after each batch (leaves update field permanently on documents)
```
Loading