Backend API Reference

Base URL: http://localhost:8080 (default)

All endpoints return JSON responses and support CORS.

Interactive Documentation

An OpenAPI 3.0 specification is available for this API:

OpenAPI Spec: backend/handlers/openapi.yaml
Swagger UI: Access /docs on the frontend (e.g., http://localhost:5173/docs or the deployed URL)
Validation: Run make openapi-validate to check spec syntax

The Swagger UI provides an interactive interface to explore endpoints, view schemas, and test API calls.

Health & Status

GET /api/v1/health

Health check endpoint.

Response:

{
  "cf_api": "ok",
  "bosh_api": "ok",
  "cache_status": {
    "cells_cached": false,
    "apps_cached": false
  }
}

Field	Description
`cf_api`	CF API connectivity status
`bosh_api`	BOSH API status (`ok` or `not_configured`)
`cache_status`	Current cache state

Dashboard

GET /api/v1/dashboard

Returns live dashboard data from CF and BOSH APIs.

Response:

{
  "cells": [
    {
      "name": "diego_cell/0",
      "isolation_segment": "shared",
      "memory_mb": 32768,
      "allocated_mb": 24576,
      "used_mb": 18432,
      "cpu_percent": 45.2
    }
  ],
  "apps": [
    {
      "guid": "abc-123",
      "name": "my-app",
      "instances": 3,
      "requested_mb": 1024,
      "actual_mb": 780,
      "isolation_segment": "shared"
    }
  ],
  "segments": [
    {
      "guid": "seg-123",
      "name": "shared"
    }
  ],
  "metadata": {
    "timestamp": "2024-01-15T10:30:00Z",
    "cached": false,
    "bosh_available": true
  }
}

Data Sources:

cells: BOSH API (Diego cell VMs and vitals)
apps: CF API (applications and process stats)
segments: CF API (isolation segments)

Infrastructure

GET /api/v1/infrastructure

Returns live infrastructure data from vSphere.

Prerequisites: Requires vSphere environment variables:

VSPHERE_HOST
VSPHERE_USERNAME
VSPHERE_PASSWORD
VSPHERE_DATACENTER

Response:

{
  "name": "vcenter.example.com",
  "source": "vsphere",
  "timestamp": "2024-01-15T10:30:00Z",
  "cached": false,
  "clusters": [
    {
      "name": "TAS-Cluster",
      "host_count": 4,
      "memory_gb": 512,
      "cpu_cores": 128,
      "memory_gb_per_host": 128,
      "cpu_threads_per_host": 32,
      "ha_admission_control_percentage": 25,
      "ha_usable_memory_gb": 384,
      "ha_usable_cpu_cores": 96,
      "ha_host_failures_survived": 1,
      "cells": [
        {
          "name": "diego_cell/0",
          "memory_gb": 64,
          "cpu": 8,
          "disk_gb": 200
        }
      ]
    }
  ],
  "total_host_count": 4,
  "total_cell_count": 10,
  "total_cell_memory_gb": 640,
  "total_cell_cpu": 80,
  "total_cell_disk_gb": 2000,
  "total_app_memory_gb": 450,
  "total_app_disk_gb": 900,
  "total_app_instances": 150,
  "platform_vms_gb": 64
}

Error (503): vSphere not configured

{
  "error": "vSphere not configured. Set VSPHERE_HOST, VSPHERE_USERNAME, VSPHERE_PASSWORD, and VSPHERE_DATACENTER environment variables.",
  "code": 503
}

POST /api/v1/infrastructure/manual

Set infrastructure state from manual input (JSON upload or form data).

Request Body:

{
  "name": "My Infrastructure",
  "clusters": [
    {
      "name": "TAS-Cluster",
      "host_count": 4,
      "memory_gb_per_host": 128,
      "cpu_threads_per_host": 32,
      "ha_admission_control_percentage": 25,
      "cells": [
        {
          "name": "diego_cell/0",
          "memory_gb": 64,
          "cpu": 8,
          "disk_gb": 200
        }
      ]
    }
  ],
  "platform_vms_gb": 64,
  "total_app_memory_gb": 450,
  "total_app_disk_gb": 900,
  "total_app_instances": 150
}

Response: Returns computed InfrastructureState (same format as GET /api/v1/infrastructure)

POST /api/v1/infrastructure/state

Set infrastructure state directly (accepts full InfrastructureState object).

Request Body: Full InfrastructureState object (same format as GET /api/v1/infrastructure response)

Response: Returns the stored state

GET /api/v1/infrastructure/status

Returns current infrastructure data source status and capacity metrics.

Response:

{
  "vsphere_configured": true,
  "has_data": true,
  "source": "vsphere",
  "name": "vcenter.example.com",
  "cluster_count": 2,
  "host_count": 8,
  "cell_count": 20,
  "timestamp": "2024-01-15T10:30:00Z",
  "constraining_resource": "memory",
  "bottleneck_summary": "Memory is the primary constraint at 78% utilization",
  "memory_utilization": 78.5,
  "n1_capacity_percent": 72.0,
  "n1_status": "ok",
  "ha_min_host_failures_survived": 1,
  "ha_status": "ok"
}

Response Fields:

Field	Type	Description
`vsphere_configured`	boolean	Whether vSphere credentials are configured
`has_data`	boolean	Whether infrastructure data has been loaded
`source`	string	Data source: "vsphere", "manual", or "json"
`name`	string	Infrastructure name (vCenter hostname or custom)
`cluster_count`	integer	Number of clusters
`host_count`	integer	Total ESXi hosts
`cell_count`	integer	Total Diego cells
`timestamp`	string	When data was loaded (ISO 8601)
`constraining_resource`	string	Primary bottleneck: "memory", "CPU", or "disk"
`bottleneck_summary`	string	Human-readable bottleneck description
`memory_utilization`	float	Host memory utilization percentage
`n1_capacity_percent`	float	Percentage of N-1 memory capacity used by cells
`n1_status`	string	N-1 capacity status: "ok", "warning", "critical", or "unavailable"
`ha_min_host_failures_survived`	integer	Number of host failures the cluster can survive
`ha_status`	string	HA status: "ok" or "at-risk"

Note: n1_status is set to "unavailable" and n1_capacity_percent to 0 for single-host clusters where N-1 capacity cannot be calculated.

GET /api/v1/infrastructure/apps

Returns detailed per-app breakdown of memory, disk, and instance allocation from Cloud Foundry.

Prerequisites: CF API credentials must be configured via CF_API_URL, CF_USERNAME, and CF_PASSWORD environment variables.

Response:

{
  "total_app_memory_gb": 5,
  "total_app_disk_gb": 12,
  "total_app_instances": 17,
  "apps": [
    {
      "name": "my-app",
      "guid": "abc-123-def",
      "instances": 3,
      "requested_mb": 1536,
      "actual_mb": 512,
      "requested_disk_mb": 3072,
      "isolation_segment": "default"
    },
    {
      "name": "worker-app",
      "guid": "xyz-456-ghi",
      "instances": 2,
      "requested_mb": 2048,
      "actual_mb": 1024,
      "requested_disk_mb": 2048,
      "isolation_segment": "shared"
    }
  ]
}

Response Fields:

Field	Type	Description
`total_app_memory_gb`	integer	Total requested memory across all apps (GB, rounded)
`total_app_disk_gb`	integer	Total requested disk across all apps (GB, rounded)
`total_app_instances`	integer	Total running instances across all apps
`apps`	array	Per-app details
`apps[].name`	string	Application name
`apps[].guid`	string	CF application GUID
`apps[].instances`	integer	Number of running instances
`apps[].requested_mb`	integer	Total requested memory (instances × memory per instance)
`apps[].actual_mb`	integer	Actual memory usage from Log Cache (if available)
`apps[].requested_disk_mb`	integer	Total requested disk (instances × disk per instance)
`apps[].isolation_segment`	string	Isolation segment name ("default" if none assigned)

Note: GB totals are rounded to the nearest integer (e.g., 2047 MB → 2 GB, 2560 MB → 3 GB).

Error Responses:

Code	Description
503	CF API not configured
503	CF authentication failed
500	Failed to fetch apps

Capacity Planning

POST /api/v1/infrastructure/planning

Calculate maximum deployable Diego cells given IaaS capacity constraints.

Prerequisites: Infrastructure data must be loaded first via /api/v1/infrastructure or /api/v1/infrastructure/manual

Request Body:

{
  "cell_memory_gb": 64,
  "cell_cpu": 8,
  "overhead_pct": 7
}

Field	Type	Description
`cell_memory_gb`	int	Desired memory per Diego cell (GB)
`cell_cpu`	int	Desired vCPUs per Diego cell
`overhead_pct`	float	Memory overhead percentage (default: 7)

Response:

{
  "result": {
    "max_cells_by_memory": 12,
    "max_cells_by_cpu": 15,
    "deployable_cells": 12,
    "bottleneck": "memory",
    "memory_used_gb": 768,
    "memory_avail_gb": 896,
    "cpu_used": 96,
    "cpu_avail": 120,
    "memory_util_pct": 85.7,
    "cpu_util_pct": 80.0,
    "headroom_cells": 2
  },
  "recommendations": [
    {
      "action": "add_hosts",
      "description": "Add 2 hosts to increase capacity",
      "impact": "Adds 256 GB memory capacity"
    }
  ]
}

Scenario Analysis

POST /api/v1/scenario/compare

Compare current infrastructure state against a proposed configuration.

Prerequisites: Infrastructure data must be loaded first

Request Body:

{
  "proposed_cell_memory_gb": 64,
  "proposed_cell_cpu": 8,
  "proposed_cell_disk_gb": 200,
  "proposed_cell_count": 15,
  "target_cluster": "",
  "selected_resources": ["memory", "cpu", "disk"],
  "overhead_pct": 7,
  "host_count": 15,
  "memory_per_host_gb": 2048,
  "ha_admission_pct": 10,
  "additional_app": {
    "name": "new-service",
    "instances": 10,
    "memory_gb": 2,
    "disk_gb": 4
  },
  "tps_curve": [
    { "cells": 1, "tps": 284 },
    { "cells": 3, "tps": 1964 },
    { "cells": 100, "tps": 1389 }
  ]
}

Field	Type	Description
`proposed_cell_memory_gb`	int	Proposed memory per cell (GB)
`proposed_cell_cpu`	int	Proposed vCPUs per cell
`proposed_cell_disk_gb`	int	Proposed disk per cell (GB)
`proposed_cell_count`	int	Proposed number of cells
`target_cluster`	string	Target cluster (empty = all)
`selected_resources`	array	Resources to analyze: `memory`, `cpu`, `disk`
`overhead_pct`	float	Memory overhead % for Garden/OS inside each cell (default: 7). See note below.
`host_count`	int	Number of ESXi hosts (for HA calculations)
`memory_per_host_gb`	int	Memory per host in GB (for HA calculations)
`ha_admission_pct`	int	vSphere HA admission control % (for HA calculations)
`additional_app`	object	Optional hypothetical app to model
`tps_curve`	array	Optional custom TPS performance curve

Note: overhead_pct vs ha_admission_pct

These operate at different layers and are not redundant:

overhead_pct (7%): Memory inside each Diego cell consumed by Garden runtime and OS processes. A 32GB cell has ~30GB available for app containers.
ha_admission_pct: Cluster-level memory reserved by vSphere to restart VMs after host failure. vSphere sees full VM footprint (32GB), not what's inside.

Both are needed: HA admission determines if you can deploy the VMs; memory overhead determines how much workload fits inside them.

Response:

{
  "current": {
    "cell_count": 10,
    "cell_memory_gb": 64,
    "cell_cpu": 8,
    "total_capacity_gb": 640,
    "app_capacity_gb": 595,
    "utilization_pct": 75.6,
    "free_chunks": 450,
    "tps": 1800,
    "tps_status": "optimal",
    "fault_impact": 15,
    "n1_utilization_pct": 72.0
  },
  "proposed": {
    "cell_count": 15,
    "cell_memory_gb": 64,
    "cell_cpu": 8,
    "total_capacity_gb": 960,
    "app_capacity_gb": 893,
    "utilization_pct": 50.4,
    "free_chunks": 680,
    "tps": 1650,
    "tps_status": "optimal",
    "fault_impact": 10,
    "n1_utilization_pct": 68.0
  },
  "delta": {
    "cell_count": 5,
    "capacity_gb": 320,
    "utilization_pct": -25.2,
    "tps": -150,
    "fault_impact": -5
  },
  "warnings": [
    {
      "severity": "warning",
      "message": "Cell count (15) may cause scheduling latency (~1650 TPS)",
      "metric": "tps"
    }
  ],
  "recommendations": [
    {
      "action": "resize_cells",
      "priority": 2,
      "description": "Consider larger cells to improve TPS",
      "impact": "Reduces scheduler coordination overhead"
    }
  ]
}

Analysis

GET /api/v1/bottleneck

Returns multi-resource bottleneck analysis.

Prerequisites: Infrastructure data must be loaded first

Response:

{
  "constraining_resource": "memory",
  "resources": [
    {
      "name": "memory",
      "utilization_pct": 78.5,
      "status": "warning",
      "headroom_gb": 142
    },
    {
      "name": "cpu",
      "utilization_pct": 45.2,
      "status": "good",
      "headroom_cores": 66
    },
    {
      "name": "disk",
      "utilization_pct": 32.1,
      "status": "good",
      "headroom_gb": 1360
    }
  ],
  "summary": "Memory is the primary constraint at 78% utilization"
}

GET /api/v1/recommendations

Returns upgrade path recommendations based on current bottlenecks.

Prerequisites: Infrastructure data must be loaded first

Response:

{
  "constraining_resource": "memory",
  "recommendations": [
    {
      "action": "add_cells",
      "priority": 1,
      "description": "Add 4 Diego cells",
      "impact": "Adds 256 GB memory capacity"
    },
    {
      "action": "resize_cells",
      "priority": 2,
      "description": "Resize cells from 64 GB to 128 GB",
      "impact": "Doubles per-cell capacity, reduces scheduler overhead"
    },
    {
      "action": "add_hosts",
      "priority": 3,
      "description": "Add 2 ESXi hosts",
      "impact": "Adds infrastructure capacity and improves N-1 tolerance"
    }
  ]
}

Error Responses

All endpoints return errors in a consistent format:

{
  "error": "Error message describing what went wrong",
  "details": "Additional details (optional)",
  "code": 400
}

Code	Description
400	Bad Request - Invalid input
405	Method Not Allowed
500	Internal Server Error
503	Service Unavailable - External service not configured

Caching

The backend implements in-memory caching with configurable TTLs:

Cache Key	Default TTL	Environment Variable
Dashboard data	30s	`DASHBOARD_CACHE_TTL`
vSphere infrastructure	300s	`VSPHERE_CACHE_TTL`
General cache	300s	`CACHE_TTL`

Cached responses include "cached": true in the metadata.

Mixed Data Source Caching

When using GET /api/v1/infrastructure with both vSphere and CF credentials configured:

vSphere data is fetched first (infrastructure: hosts, clusters, cells)
CF data is fetched to enrich app metrics (total_app_memory_gb, total_app_instances)
Combined result is cached using VSPHERE_CACHE_TTL

This means:

Both vSphere and CF data share the same cache TTL (default: 300s)
If CF data changes frequently, consider lowering VSPHERE_CACHE_TTL
Cache invalidation clears both data sources together

To force a refresh, either wait for TTL expiration or restart the backend.

Manual Data Collection

When using the manual infrastructure input endpoint (POST /api/v1/infrastructure/manual), you may need to collect app-related metrics yourself. This section documents how to obtain these values.

Data Sources

The following fields require app workload data:

Field	Description
`total_app_memory_gb`	Total memory allocated to all application instances
`total_app_disk_gb`	Total disk allocated to all application instances
`total_app_instances`	Total number of running application instances

Option 1: From Healthwatch / Aria Operations Dashboards

If you have Healthwatch or Aria Operations for Applications deployed, these metrics are already collected.

Healthwatch (Grafana/Prometheus):

# Total allocated app memory across all Diego cells (result in GB)
sum(rep_CapacityAllocatedMemory) / 1024

Aria Operations for Applications (Wavefront):

# Total allocated app memory across all Diego cells (result in GB)
sum(ts("tas.rep.CapacityAllocatedMemory")) / 1024

Diego Rep Metrics Reference:

Metric	Origin	Units	Description
`rep.CapacityTotalMemory`	rep	MiB	Max memory available for app allocation
`rep.CapacityRemainingMemory`	rep	MiB	Remaining allocatable memory
`rep.CapacityAllocatedMemory`	rep	MiB	Memory allocated to containers

Formula: TotalMemory = AllocatedMemory + RemainingMemory

Option 2: CF CLI Commands

Use these commands to collect app metrics directly from the CF API:

# Authenticate to CF
cf login -a https://api.sys.example.com

# Get total allocated app memory (sum of memory_in_mb × instances for all processes)
cf curl "/v3/processes" | jq '[.resources[] | .memory_in_mb * .instances] | add'
# Output: total MB (e.g., 10752)

# For large foundations, handle pagination:
total_mb=0
next_url="/v3/processes?per_page=5000"
while [ "$next_url" != "null" ]; do
  response=$(cf curl "$next_url")
  page_total=$(echo "$response" | jq '[.resources[] | .memory_in_mb * .instances] | add // 0')
  total_mb=$((total_mb + page_total))
  next_url=$(echo "$response" | jq -r '.pagination.next.href // "null"')
done
echo "Total app memory: $((total_mb / 1024)) GB"

# Get total app instances
cf curl "/v3/processes?per_page=5000" | jq '[.resources[].instances] | add'

# Get total disk
cf curl "/v3/processes?per_page=5000" | jq '[.resources[] | .disk_in_mb * .instances] | add'
# Convert to GB: divide by 1024

Understanding the Numbers

Field	CF API Source	Calculation
`total_app_memory_gb`	`/v3/processes`	Sum of (memory_in_mb × instances), convert to GB
`total_app_instances`	`/v3/processes`	Sum of instances
`total_app_disk_gb`	`/v3/processes`	Sum of (disk_in_mb × instances), convert to GB

Example Manual JSON Input

After collecting the values above:

{
  "name": "Production TAS",
  "clusters": [
    {
      "name": "TAS-Cluster",
      "host_count": 8,
      "memory_gb_per_host": 2048,
      "cpu_threads_per_host": 64,
      "diego_cell_count": 250,
      "diego_cell_memory_gb": 32,
      "diego_cell_cpu": 4
    }
  ],
  "platform_vms_gb": 4800,
  "total_app_memory_gb": 10,
  "total_app_disk_gb": 50,
  "total_app_instances": 42
}

Automatic Enrichment

When using live vSphere data (GET /api/v1/infrastructure), the backend automatically enriches infrastructure data with app metrics from the CF API if CF credentials are configured. This eliminates the need for manual data collection in most cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend API Reference

Interactive Documentation

Health & Status

GET /api/v1/health

Dashboard

GET /api/v1/dashboard

Infrastructure

GET /api/v1/infrastructure

POST /api/v1/infrastructure/manual

POST /api/v1/infrastructure/state

GET /api/v1/infrastructure/status

GET /api/v1/infrastructure/apps

Capacity Planning

POST /api/v1/infrastructure/planning

Scenario Analysis

POST /api/v1/scenario/compare

Analysis

GET /api/v1/bottleneck

GET /api/v1/recommendations

Error Responses

Caching

Mixed Data Source Caching

Manual Data Collection

Data Sources

Option 1: From Healthwatch / Aria Operations Dashboards

Option 2: CF CLI Commands

Understanding the Numbers

Example Manual JSON Input

Automatic Enrichment

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

Backend API Reference

Interactive Documentation

Health & Status

GET /api/v1/health

Dashboard

GET /api/v1/dashboard

Infrastructure

GET /api/v1/infrastructure

POST /api/v1/infrastructure/manual

POST /api/v1/infrastructure/state

GET /api/v1/infrastructure/status

GET /api/v1/infrastructure/apps

Capacity Planning

POST /api/v1/infrastructure/planning

Scenario Analysis

POST /api/v1/scenario/compare

Analysis

GET /api/v1/bottleneck

GET /api/v1/recommendations

Error Responses

Caching

Mixed Data Source Caching

Manual Data Collection

Data Sources

Option 1: From Healthwatch / Aria Operations Dashboards

Option 2: CF CLI Commands

Understanding the Numbers

Example Manual JSON Input

Automatic Enrichment