-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Steps to reproduce:
When running a DRS automation plan with a large number of targets, the log item can overflow the drs-plan-automation-results-prod DDB item size limit
[ERROR] ClientError: An error occurred (ValidationException) when calling the PutItem operation: Item size has exceeded the maximum allowed size
Traceback (most recent call last):
File "/var/task/drs_automation_plan.py", line 826, in handler
record_status(event, ddb_client)
File "/var/task/drs_automation_plan.py", line 112, in record_status
raise err
File "/var/task/drs_automation_plan.py", line 106, in record_status
response = ddb_client.put_item(
File "/var/runtime/botocore/client.py", line 553, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 1009, in _make_api_call
raise error_class(parsed_response, operation_name)
Potential Fixes:
In looking at the code, the DynamoDB structure currently relies on the specific key ID to lookup and get the status of recovery items. The table currently has a partition key that is a composite key include the AppId and PlanId, and the sort key is the execution id.
- There could be an additional property added to the item that indicates a record number, and the data could be spread across multiple items.
- Different components of the current item could be broken into separate items, i.e. currently there are several properties that have lengthy data: log, planDetails, SourceServers, Waves. Each of these could be written to their own item, instead of logging them each to their own.
These two solutions each have their own limitations. The record number would require truncating the json data in a way that it can be put back together gracefully, either by finalizing an item and closing the json document before finishing a record, or by treating them as strings and concatenating them back together. Breaking out into different items may still also run into size limitation on any individual row.
A better solution may instead be to include a S3 bucket in the deployment of the solution, and to write the log records that exceed the 400kb threshold to an S3 item, and store the bucket and key name to the file in the DDB:
BUCKET = some_bucket
def record_status(event, ddb_client):
MAX_SIZE_IN_KB = 400
result = event['result']
# find all datetimes and replace them:
logger.debug("serializing {}".format(result))
serialized_result = python_obj_to_dynamo_obj(result)
logger.debug("serialized {}".format(serialized_result))
data_str = json.dumps(serialized_result)
data_size_kb = len(data_str) / 1024
if data_size_kb >= 0.9 * MAX_SIZE_IN_KB:
s3_key = f"/{DRS_RESULTS_TABLE}/{result['AppId_PlanId']}/{result['ExecutionId']}"
write_s3_data(s3_key, data_str)
serialized_result = {
'AppId_PlanId': result['AppId_PlanId'],
'ExecutionId': result['ExecutionId'],
's3Bucket': BUCKET,
's3Key' : s3_key
}
try:
response = ddb_client.put_item(
TableName=DRS_RESULTS_TABLE,
Item=serialized_result
)
except Exception as err:
logger.error("Error putting item {} into table: {}: {}".format(serialized_result, DRS_RESULTS_TABLE, err))
raise err
else:
return response
def write_s3_data(s3_key, data) -> bool:
try:
s3.Object(bucket_name=BUCKET, key=s3_key).put(
Body=data,
ServerSideEncryption='aws:kms',
Metadata={
"Content-Type": "text/json"
}
)
return True
except Exception as err:
logger.error(err)
raise err