Skip to content

Comments

Add scaling status endpoints and documentation for cpu-app SRE operations#103

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-102
Draft

Add scaling status endpoints and documentation for cpu-app SRE operations#103
Copilot wants to merge 3 commits intomainfrom
copilot/fix-102

Conversation

Copy link

Copilot AI commented Aug 19, 2025

This PR addresses the SRE agent issue where scale-down operations fail because the cpu-app is already at minimum instance count. The solution provides clear API endpoints and comprehensive documentation to help SRE automation systems understand when scale-down failures are expected behavior rather than actual errors.

Problem

The SRE agent was reporting scale-down operation failures for cpu-app with the message "Already at minimum instance count (1)". This was being treated as an error, but it's actually expected behavior for Azure App Service Free tier (F1) which:

  • Cannot scale below 1 instance
  • Cannot scale above 1 instance
  • Does not support auto-scaling
  • Has limited metadata exposure

Solution

New API Endpoints

Added two new endpoints specifically designed for operational monitoring:

Health Endpoint (GET /api/app/health)
Health Endpoint

Returns general application health with basic scaling information including current instances, tier limitations, and scaling capabilities.

Scaling Status Endpoint (GET /api/app/scaling-status)
Scaling Status Endpoint

Returns detailed scaling information specifically for SRE automation systems, including:

  • Current instance count and scaling limits
  • Clear boolean flags for scaleDownPossible and scaleUpPossible
  • SRE-specific status flag: "statusForSRE": "EXPECTED_STATE_NO_ACTION_NEEDED"
  • Actionable recommendations for operations teams

Infrastructure Updates

Enhanced the Bicep template (main.bicep) with:

  • Explicit capacity configuration for the App Service Plan
  • Detailed comments explaining Free tier limitations
  • Output variables providing scaling metadata for operational awareness
  • App settings that expose scaling configuration to the application

Documentation

Created comprehensive operational documentation (SCALING-INFO.md) that includes:

  • Tier comparison table showing scaling capabilities across Azure App Service tiers
  • Troubleshooting guide for common scaling scenarios
  • Expected behavior explanations for SRE agents
  • Recommendations for production workload configuration

Benefits for SRE Operations

  1. Proactive Checks: SRE agents can call /api/app/scaling-status before attempting scaling operations
  2. Clear Status Indicators: Boolean flags and status messages eliminate ambiguity
  3. Automated Decision Making: The statusForSRE field enables automation to handle expected states
  4. Operational Knowledge: Documentation provides context for manual interventions

Example Usage

# Check if scaling is possible before attempting
curl /api/app/scaling-status | jq '.scalingLimits.scaleDownPossible'
# Returns: false

# Get SRE-specific status for automation
curl /api/app/scaling-status | jq '.statusForSRE'  
# Returns: "EXPECTED_STATE_NO_ACTION_NEEDED"

This change transforms the reported "failure" into properly documented expected behavior, enabling SRE systems to make informed decisions about scaling operations.

Fixes #102.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: bicep build main.bicep (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits August 19, 2025 19:05
Co-authored-by: mrsharm <68247673+mrsharm@users.noreply.github.com>
Co-authored-by: mrsharm <68247673+mrsharm@users.noreply.github.com>
Copilot AI changed the title [WIP] Scale-down operation for cpu-app: Already at minimum instance count Add scaling status endpoints and documentation for cpu-app SRE operations Aug 19, 2025
Copilot AI requested a review from mrsharm August 19, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scale-down operation for cpu-app: Already at minimum instance count

2 participants