Skip to content

DRS Automation Plan logs exceed DDB allowed size  #26

@Nathan-Kulas-Tyler

Description

@Nathan-Kulas-Tyler

Steps to reproduce:

When running a DRS automation plan with a large number of targets, the log item can overflow the drs-plan-automation-results-prod DDB item size limit

[ERROR] ClientError: An error occurred (ValidationException) when calling the PutItem operation: Item size has exceeded the maximum allowed size
Traceback (most recent call last):
File "/var/task/drs_automation_plan.py", line 826, in handler
record_status(event, ddb_client)
File "/var/task/drs_automation_plan.py", line 112, in record_status
raise err
File "/var/task/drs_automation_plan.py", line 106, in record_status
response = ddb_client.put_item(
File "/var/runtime/botocore/client.py", line 553, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 1009, in _make_api_call
raise error_class(parsed_response, operation_name)

Potential Fixes:

In looking at the code, the DynamoDB structure currently relies on the specific key ID to lookup and get the status of recovery items. The table currently has a partition key that is a composite key include the AppId and PlanId, and the sort key is the execution id.

  • There could be an additional property added to the item that indicates a record number, and the data could be spread across multiple items.
  • Different components of the current item could be broken into separate items, i.e. currently there are several properties that have lengthy data: log, planDetails, SourceServers, Waves. Each of these could be written to their own item, instead of logging them each to their own.

These two solutions each have their own limitations. The record number would require truncating the json data in a way that it can be put back together gracefully, either by finalizing an item and closing the json document before finishing a record, or by treating them as strings and concatenating them back together. Breaking out into different items may still also run into size limitation on any individual row.

A better solution may instead be to include a S3 bucket in the deployment of the solution, and to write the log records that exceed the 400kb threshold to an S3 item, and store the bucket and key name to the file in the DDB:

BUCKET = some_bucket

def record_status(event, ddb_client):
    MAX_SIZE_IN_KB = 400
    result = event['result']
    # find all datetimes and replace them:

    logger.debug("serializing {}".format(result))
    serialized_result = python_obj_to_dynamo_obj(result)
    logger.debug("serialized {}".format(serialized_result))
    data_str = json.dumps(serialized_result)
    data_size_kb = len(data_str) / 1024
    if data_size_kb >= 0.9 * MAX_SIZE_IN_KB:
    s3_key = f"/{DRS_RESULTS_TABLE}/{result['AppId_PlanId']}/{result['ExecutionId']}"
    write_s3_data(s3_key, data_str)
    serialized_result = {
        'AppId_PlanId': result['AppId_PlanId'],
        'ExecutionId': result['ExecutionId'],
        's3Bucket': BUCKET,
        's3Key' : s3_key
    }

    try:
        response = ddb_client.put_item(
            TableName=DRS_RESULTS_TABLE,
            Item=serialized_result
        )
    except Exception as err:
        logger.error("Error putting item {} into table: {}: {}".format(serialized_result, DRS_RESULTS_TABLE, err))
        raise err
    else:
        return response
		

def write_s3_data(s3_key, data) -> bool:
    try:
        s3.Object(bucket_name=BUCKET, key=s3_key).put(
            Body=data,
            ServerSideEncryption='aws:kms',
            Metadata={
                "Content-Type": "text/json"
            }
        )
        return True
    except Exception as err:
        logger.error(err)
        raise err

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions