Base URL: http://localhost:8080 (default)
All endpoints return JSON responses and support CORS.
An OpenAPI 3.0 specification is available for this API:
- OpenAPI Spec:
backend/handlers/openapi.yaml - Swagger UI: Access
/docson the frontend (e.g.,http://localhost:5173/docsor the deployed URL) - Validation: Run
make openapi-validateto check spec syntax
The Swagger UI provides an interactive interface to explore endpoints, view schemas, and test API calls.
Health check endpoint.
Response:
{
"cf_api": "ok",
"bosh_api": "ok",
"cache_status": {
"cells_cached": false,
"apps_cached": false
}
}| Field | Description |
|---|---|
cf_api |
CF API connectivity status |
bosh_api |
BOSH API status (ok or not_configured) |
cache_status |
Current cache state |
Returns live dashboard data from CF and BOSH APIs.
Response:
{
"cells": [
{
"name": "diego_cell/0",
"isolation_segment": "shared",
"memory_mb": 32768,
"allocated_mb": 24576,
"used_mb": 18432,
"cpu_percent": 45.2
}
],
"apps": [
{
"guid": "abc-123",
"name": "my-app",
"instances": 3,
"requested_mb": 1024,
"actual_mb": 780,
"isolation_segment": "shared"
}
],
"segments": [
{
"guid": "seg-123",
"name": "shared"
}
],
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"cached": false,
"bosh_available": true
}
}Data Sources:
cells: BOSH API (Diego cell VMs and vitals)apps: CF API (applications and process stats)segments: CF API (isolation segments)
Returns live infrastructure data from vSphere.
Prerequisites: Requires vSphere environment variables:
VSPHERE_HOSTVSPHERE_USERNAMEVSPHERE_PASSWORDVSPHERE_DATACENTER
Response:
{
"name": "vcenter.example.com",
"source": "vsphere",
"timestamp": "2024-01-15T10:30:00Z",
"cached": false,
"clusters": [
{
"name": "TAS-Cluster",
"host_count": 4,
"memory_gb": 512,
"cpu_cores": 128,
"memory_gb_per_host": 128,
"cpu_threads_per_host": 32,
"ha_admission_control_percentage": 25,
"ha_usable_memory_gb": 384,
"ha_usable_cpu_cores": 96,
"ha_host_failures_survived": 1,
"cells": [
{
"name": "diego_cell/0",
"memory_gb": 64,
"cpu": 8,
"disk_gb": 200
}
]
}
],
"total_host_count": 4,
"total_cell_count": 10,
"total_cell_memory_gb": 640,
"total_cell_cpu": 80,
"total_cell_disk_gb": 2000,
"total_app_memory_gb": 450,
"total_app_disk_gb": 900,
"total_app_instances": 150,
"platform_vms_gb": 64
}Error (503): vSphere not configured
{
"error": "vSphere not configured. Set VSPHERE_HOST, VSPHERE_USERNAME, VSPHERE_PASSWORD, and VSPHERE_DATACENTER environment variables.",
"code": 503
}Set infrastructure state from manual input (JSON upload or form data).
Request Body:
{
"name": "My Infrastructure",
"clusters": [
{
"name": "TAS-Cluster",
"host_count": 4,
"memory_gb_per_host": 128,
"cpu_threads_per_host": 32,
"ha_admission_control_percentage": 25,
"cells": [
{
"name": "diego_cell/0",
"memory_gb": 64,
"cpu": 8,
"disk_gb": 200
}
]
}
],
"platform_vms_gb": 64,
"total_app_memory_gb": 450,
"total_app_disk_gb": 900,
"total_app_instances": 150
}Response: Returns computed InfrastructureState (same format as GET /api/v1/infrastructure)
Set infrastructure state directly (accepts full InfrastructureState object).
Request Body: Full InfrastructureState object (same format as GET /api/v1/infrastructure response)
Response: Returns the stored state
Returns current infrastructure data source status and capacity metrics.
Response:
{
"vsphere_configured": true,
"has_data": true,
"source": "vsphere",
"name": "vcenter.example.com",
"cluster_count": 2,
"host_count": 8,
"cell_count": 20,
"timestamp": "2024-01-15T10:30:00Z",
"constraining_resource": "memory",
"bottleneck_summary": "Memory is the primary constraint at 78% utilization",
"memory_utilization": 78.5,
"n1_capacity_percent": 72.0,
"n1_status": "ok",
"ha_min_host_failures_survived": 1,
"ha_status": "ok"
}Response Fields:
| Field | Type | Description |
|---|---|---|
vsphere_configured |
boolean | Whether vSphere credentials are configured |
has_data |
boolean | Whether infrastructure data has been loaded |
source |
string | Data source: "vsphere", "manual", or "json" |
name |
string | Infrastructure name (vCenter hostname or custom) |
cluster_count |
integer | Number of clusters |
host_count |
integer | Total ESXi hosts |
cell_count |
integer | Total Diego cells |
timestamp |
string | When data was loaded (ISO 8601) |
constraining_resource |
string | Primary bottleneck: "memory", "CPU", or "disk" |
bottleneck_summary |
string | Human-readable bottleneck description |
memory_utilization |
float | Host memory utilization percentage |
n1_capacity_percent |
float | Percentage of N-1 memory capacity used by cells |
n1_status |
string | N-1 capacity status: "ok", "warning", "critical", or "unavailable" |
ha_min_host_failures_survived |
integer | Number of host failures the cluster can survive |
ha_status |
string | HA status: "ok" or "at-risk" |
Note: n1_status is set to "unavailable" and n1_capacity_percent to 0 for single-host clusters where N-1 capacity cannot be calculated.
Returns detailed per-app breakdown of memory, disk, and instance allocation from Cloud Foundry.
Prerequisites: CF API credentials must be configured via CF_API_URL, CF_USERNAME, and CF_PASSWORD environment variables.
Response:
{
"total_app_memory_gb": 5,
"total_app_disk_gb": 12,
"total_app_instances": 17,
"apps": [
{
"name": "my-app",
"guid": "abc-123-def",
"instances": 3,
"requested_mb": 1536,
"actual_mb": 512,
"requested_disk_mb": 3072,
"isolation_segment": "default"
},
{
"name": "worker-app",
"guid": "xyz-456-ghi",
"instances": 2,
"requested_mb": 2048,
"actual_mb": 1024,
"requested_disk_mb": 2048,
"isolation_segment": "shared"
}
]
}Response Fields:
| Field | Type | Description |
|---|---|---|
total_app_memory_gb |
integer | Total requested memory across all apps (GB, rounded) |
total_app_disk_gb |
integer | Total requested disk across all apps (GB, rounded) |
total_app_instances |
integer | Total running instances across all apps |
apps |
array | Per-app details |
apps[].name |
string | Application name |
apps[].guid |
string | CF application GUID |
apps[].instances |
integer | Number of running instances |
apps[].requested_mb |
integer | Total requested memory (instances × memory per instance) |
apps[].actual_mb |
integer | Actual memory usage from Log Cache (if available) |
apps[].requested_disk_mb |
integer | Total requested disk (instances × disk per instance) |
apps[].isolation_segment |
string | Isolation segment name ("default" if none assigned) |
Note: GB totals are rounded to the nearest integer (e.g., 2047 MB → 2 GB, 2560 MB → 3 GB).
Error Responses:
| Code | Description |
|---|---|
| 503 | CF API not configured |
| 503 | CF authentication failed |
| 500 | Failed to fetch apps |
Calculate maximum deployable Diego cells given IaaS capacity constraints.
Prerequisites: Infrastructure data must be loaded first via /api/v1/infrastructure or /api/v1/infrastructure/manual
Request Body:
{
"cell_memory_gb": 64,
"cell_cpu": 8,
"overhead_pct": 7
}| Field | Type | Description |
|---|---|---|
cell_memory_gb |
int | Desired memory per Diego cell (GB) |
cell_cpu |
int | Desired vCPUs per Diego cell |
overhead_pct |
float | Memory overhead percentage (default: 7) |
Response:
{
"result": {
"max_cells_by_memory": 12,
"max_cells_by_cpu": 15,
"deployable_cells": 12,
"bottleneck": "memory",
"memory_used_gb": 768,
"memory_avail_gb": 896,
"cpu_used": 96,
"cpu_avail": 120,
"memory_util_pct": 85.7,
"cpu_util_pct": 80.0,
"headroom_cells": 2
},
"recommendations": [
{
"action": "add_hosts",
"description": "Add 2 hosts to increase capacity",
"impact": "Adds 256 GB memory capacity"
}
]
}Compare current infrastructure state against a proposed configuration.
Prerequisites: Infrastructure data must be loaded first
Request Body:
{
"proposed_cell_memory_gb": 64,
"proposed_cell_cpu": 8,
"proposed_cell_disk_gb": 200,
"proposed_cell_count": 15,
"target_cluster": "",
"selected_resources": ["memory", "cpu", "disk"],
"overhead_pct": 7,
"host_count": 15,
"memory_per_host_gb": 2048,
"ha_admission_pct": 10,
"additional_app": {
"name": "new-service",
"instances": 10,
"memory_gb": 2,
"disk_gb": 4
},
"tps_curve": [
{ "cells": 1, "tps": 284 },
{ "cells": 3, "tps": 1964 },
{ "cells": 100, "tps": 1389 }
]
}| Field | Type | Description |
|---|---|---|
proposed_cell_memory_gb |
int | Proposed memory per cell (GB) |
proposed_cell_cpu |
int | Proposed vCPUs per cell |
proposed_cell_disk_gb |
int | Proposed disk per cell (GB) |
proposed_cell_count |
int | Proposed number of cells |
target_cluster |
string | Target cluster (empty = all) |
selected_resources |
array | Resources to analyze: memory, cpu, disk |
overhead_pct |
float | Memory overhead % for Garden/OS inside each cell (default: 7). See note below. |
host_count |
int | Number of ESXi hosts (for HA calculations) |
memory_per_host_gb |
int | Memory per host in GB (for HA calculations) |
ha_admission_pct |
int | vSphere HA admission control % (for HA calculations) |
additional_app |
object | Optional hypothetical app to model |
tps_curve |
array | Optional custom TPS performance curve |
Note: overhead_pct vs ha_admission_pct
These operate at different layers and are not redundant:
overhead_pct(7%): Memory inside each Diego cell consumed by Garden runtime and OS processes. A 32GB cell has ~30GB available for app containers.ha_admission_pct: Cluster-level memory reserved by vSphere to restart VMs after host failure. vSphere sees full VM footprint (32GB), not what's inside.
Both are needed: HA admission determines if you can deploy the VMs; memory overhead determines how much workload fits inside them.
Response:
{
"current": {
"cell_count": 10,
"cell_memory_gb": 64,
"cell_cpu": 8,
"total_capacity_gb": 640,
"app_capacity_gb": 595,
"utilization_pct": 75.6,
"free_chunks": 450,
"tps": 1800,
"tps_status": "optimal",
"fault_impact": 15,
"n1_utilization_pct": 72.0
},
"proposed": {
"cell_count": 15,
"cell_memory_gb": 64,
"cell_cpu": 8,
"total_capacity_gb": 960,
"app_capacity_gb": 893,
"utilization_pct": 50.4,
"free_chunks": 680,
"tps": 1650,
"tps_status": "optimal",
"fault_impact": 10,
"n1_utilization_pct": 68.0
},
"delta": {
"cell_count": 5,
"capacity_gb": 320,
"utilization_pct": -25.2,
"tps": -150,
"fault_impact": -5
},
"warnings": [
{
"severity": "warning",
"message": "Cell count (15) may cause scheduling latency (~1650 TPS)",
"metric": "tps"
}
],
"recommendations": [
{
"action": "resize_cells",
"priority": 2,
"description": "Consider larger cells to improve TPS",
"impact": "Reduces scheduler coordination overhead"
}
]
}Returns multi-resource bottleneck analysis.
Prerequisites: Infrastructure data must be loaded first
Response:
{
"constraining_resource": "memory",
"resources": [
{
"name": "memory",
"utilization_pct": 78.5,
"status": "warning",
"headroom_gb": 142
},
{
"name": "cpu",
"utilization_pct": 45.2,
"status": "good",
"headroom_cores": 66
},
{
"name": "disk",
"utilization_pct": 32.1,
"status": "good",
"headroom_gb": 1360
}
],
"summary": "Memory is the primary constraint at 78% utilization"
}Returns upgrade path recommendations based on current bottlenecks.
Prerequisites: Infrastructure data must be loaded first
Response:
{
"constraining_resource": "memory",
"recommendations": [
{
"action": "add_cells",
"priority": 1,
"description": "Add 4 Diego cells",
"impact": "Adds 256 GB memory capacity"
},
{
"action": "resize_cells",
"priority": 2,
"description": "Resize cells from 64 GB to 128 GB",
"impact": "Doubles per-cell capacity, reduces scheduler overhead"
},
{
"action": "add_hosts",
"priority": 3,
"description": "Add 2 ESXi hosts",
"impact": "Adds infrastructure capacity and improves N-1 tolerance"
}
]
}All endpoints return errors in a consistent format:
{
"error": "Error message describing what went wrong",
"details": "Additional details (optional)",
"code": 400
}| Code | Description |
|---|---|
| 400 | Bad Request - Invalid input |
| 405 | Method Not Allowed |
| 500 | Internal Server Error |
| 503 | Service Unavailable - External service not configured |
The backend implements in-memory caching with configurable TTLs:
| Cache Key | Default TTL | Environment Variable |
|---|---|---|
| Dashboard data | 30s | DASHBOARD_CACHE_TTL |
| vSphere infrastructure | 300s | VSPHERE_CACHE_TTL |
| General cache | 300s | CACHE_TTL |
Cached responses include "cached": true in the metadata.
When using GET /api/v1/infrastructure with both vSphere and CF credentials configured:
- vSphere data is fetched first (infrastructure: hosts, clusters, cells)
- CF data is fetched to enrich app metrics (
total_app_memory_gb,total_app_instances) - Combined result is cached using
VSPHERE_CACHE_TTL
This means:
- Both vSphere and CF data share the same cache TTL (default: 300s)
- If CF data changes frequently, consider lowering
VSPHERE_CACHE_TTL - Cache invalidation clears both data sources together
To force a refresh, either wait for TTL expiration or restart the backend.
When using the manual infrastructure input endpoint (POST /api/v1/infrastructure/manual), you may need to collect app-related metrics yourself. This section documents how to obtain these values.
The following fields require app workload data:
| Field | Description |
|---|---|
total_app_memory_gb |
Total memory allocated to all application instances |
total_app_disk_gb |
Total disk allocated to all application instances |
total_app_instances |
Total number of running application instances |
If you have Healthwatch or Aria Operations for Applications deployed, these metrics are already collected.
Healthwatch (Grafana/Prometheus):
# Total allocated app memory across all Diego cells (result in GB)
sum(rep_CapacityAllocatedMemory) / 1024
Aria Operations for Applications (Wavefront):
# Total allocated app memory across all Diego cells (result in GB)
sum(ts("tas.rep.CapacityAllocatedMemory")) / 1024Diego Rep Metrics Reference:
| Metric | Origin | Units | Description |
|---|---|---|---|
rep.CapacityTotalMemory |
rep | MiB | Max memory available for app allocation |
rep.CapacityRemainingMemory |
rep | MiB | Remaining allocatable memory |
rep.CapacityAllocatedMemory |
rep | MiB | Memory allocated to containers |
Formula: TotalMemory = AllocatedMemory + RemainingMemory
Use these commands to collect app metrics directly from the CF API:
# Authenticate to CF
cf login -a https://api.sys.example.com
# Get total allocated app memory (sum of memory_in_mb × instances for all processes)
cf curl "/v3/processes" | jq '[.resources[] | .memory_in_mb * .instances] | add'
# Output: total MB (e.g., 10752)
# For large foundations, handle pagination:
total_mb=0
next_url="/v3/processes?per_page=5000"
while [ "$next_url" != "null" ]; do
response=$(cf curl "$next_url")
page_total=$(echo "$response" | jq '[.resources[] | .memory_in_mb * .instances] | add // 0')
total_mb=$((total_mb + page_total))
next_url=$(echo "$response" | jq -r '.pagination.next.href // "null"')
done
echo "Total app memory: $((total_mb / 1024)) GB"
# Get total app instances
cf curl "/v3/processes?per_page=5000" | jq '[.resources[].instances] | add'
# Get total disk
cf curl "/v3/processes?per_page=5000" | jq '[.resources[] | .disk_in_mb * .instances] | add'
# Convert to GB: divide by 1024| Field | CF API Source | Calculation |
|---|---|---|
total_app_memory_gb |
/v3/processes |
Sum of (memory_in_mb × instances), convert to GB |
total_app_instances |
/v3/processes |
Sum of instances |
total_app_disk_gb |
/v3/processes |
Sum of (disk_in_mb × instances), convert to GB |
After collecting the values above:
{
"name": "Production TAS",
"clusters": [
{
"name": "TAS-Cluster",
"host_count": 8,
"memory_gb_per_host": 2048,
"cpu_threads_per_host": 64,
"diego_cell_count": 250,
"diego_cell_memory_gb": 32,
"diego_cell_cpu": 4
}
],
"platform_vms_gb": 4800,
"total_app_memory_gb": 10,
"total_app_disk_gb": 50,
"total_app_instances": 42
}When using live vSphere data (GET /api/v1/infrastructure), the backend automatically enriches infrastructure data with app metrics from the CF API if CF credentials are configured. This eliminates the need for manual data collection in most cases.