Skip to content
This repository was archived by the owner on Jan 25, 2026. It is now read-only.

chore: Improve ECS service stability under Spot interruptions#14

Merged
jongwooo merged 4 commits intomainfrom
chore/improve-ecs-service-stability-under-spot-interruptions
Dec 29, 2025
Merged

chore: Improve ECS service stability under Spot interruptions#14
jongwooo merged 4 commits intomainfrom
chore/improve-ecs-service-stability-under-spot-interruptions

Conversation

@jongwooo
Copy link
Contributor

This pull request introduces several improvements to the AWS ECS and ALB infrastructure modules. The main changes focus on optimizing ECS resource allocation, enhancing scaling strategies, and adjusting ALB target group settings for better performance and reliability.

ECS Service & Capacity Provider Enhancements:

  • Added a capacity_provider_strategy block to the ECS service, specifying the use of the defined capacity provider with a base of 2 and weight of 1 to better control resource allocation.
  • Enabled managed scaling in the ECS capacity provider and set the target capacity to 90% (down from 100%) for more efficient scaling.
  • Introduced a placement_constraints block to ensure containers are placed on distinct instances, improving fault tolerance.

ECS Task Definition Improvements:

  • Changed memory configuration in the ECS task definition by adding memoryReservation (512) alongside memory (768) for better resource management within the container.

ALB Target Group Configuration:

  • Increased the deregistration_delay for the ALB target group from 5 seconds to 30 seconds to allow more time for in-flight requests to complete during instance deregistration.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances AWS ECS service resilience for Spot instance workloads by improving resource allocation, scaling efficiency, and connection draining. The changes adjust capacity provider strategies, task placement rules, memory management, and ALB deregistration timing to better handle infrastructure disruptions.

Key Changes:

  • Added capacity provider strategy with base allocation and distinctInstance placement constraints for better fault tolerance
  • Optimized ECS managed scaling target from 100% to 90% capacity and introduced memory reservation alongside hard memory limits
  • Increased ALB target group deregistration delay from 5 to 30 seconds for graceful connection draining

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
Modules/AWS/ECS/main.tf Adds capacity provider strategy with base=2 and distinctInstance constraint; introduces memoryReservation=512 alongside memory=768; reduces managed scaling target_capacity to 90%
Modules/AWS/ALB/main.tf Increases deregistration_delay from 5 to 30 seconds for better handling of in-flight requests during instance termination

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jongwooo jongwooo merged commit 2aff242 into main Dec 29, 2025
7 checks passed
@jongwooo jongwooo deleted the chore/improve-ecs-service-stability-under-spot-interruptions branch December 29, 2025 03:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant