Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 240 additions & 1 deletion tools/azure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ This directory contains scripts that interact with Azure services to collect ope
- [Data Collectors](#data-collectors)
- [get_costs.py](#get_costspy) - Azure Cost Management API
- [get_storage_usage.py](#get_storage_usagepy) - Storage metrics from Azure Monitor
- [get_pod_inventory.py](#get_pod_inventorypy) - Kubernetes pod inventory from Log Analytics
- [get_pod_inventory.py](#get_pod_inventorypy) - Kubernetes pod inventory from Log Analytics (legacy)
- [get_pod_node_inventory.py](#get_pod_node_inventorypy) - Kubernetes pod and node inventory from Log Analytics
- [get_ala_thor_timeline.py](#get_ala_thor_timelinepy) - Thor workunit timeline from Log Analytics
- [get_vm_pricing.py](#get_vm_pricingpy) - Azure VM pricing from Retail Prices API
- [Data Analyzers](#data-analyzers)
- [analyze_costs.py](#analyze_costspy) - Cost breakdown and visualization
- [analyze_storage_usage.py](#analyze_storage_usagepy) - Storage usage analysis
- [analyze_pod_node_inventory.py](#analyze_pod_node_inventorypy) - Pod and node inventory analysis with HPCC component identification
- [analyze_thor_timeline.py](#analyze_thor_timelinepy) - Thor timeline utilization and cost modeling
- [Quick Start](#quick-start)
- [Authentication](#authentication)
Expand Down Expand Up @@ -314,6 +316,103 @@ hpcc-dali-0,Running,aks-agentpool-12345,ready,2025-11-01T08:30:00Z

---

### get_pod_node_inventory.py

Get Kubernetes pod and node inventory from Azure Log Analytics.

#### Purpose

Query both `KubePodInventory` and `KubeNodeInventory` tables in Azure Log Analytics to retrieve comprehensive information about pods and nodes in a namespace during a given time range. This tool provides unified pod and node data for component-level resource analysis.

#### When to Use

- Analyzing HPCC component resource consumption
- Cross-referencing pods to nodes for capacity planning
- Understanding which components are using which nodes
- Generating data for component-level cost attribution
- Investigating resource allocation patterns

#### Features

- Queries both `KubePodInventory` and `KubeNodeInventory` tables via Azure Log Analytics REST API
- Supports workspace ID or AKS cluster discovery for workspace lookup
- Namespace filtering or all-namespaces mode
- Flexible time window specification (start/end or start+duration)
- CSV output format with metadata headers
- Comprehensive validation (RFC 1123 namespace names, datetime validation)
- KQL injection protection

#### Usage

```bash
# Using workspace ID directly
./get_pod_node_inventory.py --workspace-id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
-n hpcc --start-time "2025-11-04 12:00"

# Using cluster discovery
./get_pod_node_inventory.py --cluster my-aks-cluster \
--resource-group my-resource-group -n hpcc --start-time "2025-11-04 12:00"

# With explicit time range
./get_pod_node_inventory.py --workspace-id <workspace-id> \
-n hpcc --start-time "2025-11-04 09:00" --end-time "2025-11-04 17:00"

# Query all namespaces
./get_pod_node_inventory.py --workspace-id <workspace-id> \
--all-namespaces --start-time "2025-11-04 12:00"

# Save to CSV file
./get_pod_node_inventory.py --workspace-id <workspace-id> \
-n hpcc --start-time "2025-11-04 12:00" > inventory.csv
```

#### Command-Line Options

**Required Arguments:**
- `--start-time DATETIME` - Start time (YYYY-MM-DD or YYYY-MM-DD HH:MM)
- Either `-n, --namespace NAME` or `--all-namespaces` - Namespace to query
- Either `--workspace-id ID` or `--cluster NAME` with `--resource-group RG`

**Workspace Identification (choose one):**
- `--workspace-id ID` - Log Analytics workspace ID (customer ID) directly
- `--cluster NAME` - AKS cluster name (discovers workspace from cluster)
- Requires: `--resource-group RG`
- Optional: `--subscription ID`

**Optional Arguments:**
- `--end-time DATETIME` - End time (YYYY-MM-DD or YYYY-MM-DD HH:MM)
- `--duration MINUTES` - Time window in minutes from start (default: 60)
- `--verbose` - Print KQL query for debugging

#### Output Format

CSV format with metadata header:
```csv
# Generated by: get_pod_node_inventory.py
# Date generated: 2025-12-12 09:00:00
# Workspace ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
# Time range: 2025-11-04 12:00:00 UTC to 2025-11-04 14:00:00 UTC
# Namespace: hpcc
#
TimeGenerated,RecordType,Name,Namespace,PodStatus,Computer,ContainerStatus,...
2025-11-04T12:00:00Z,Pod,hpcc-dali-0,hpcc,Running,aks-node-123,ready,...
2025-11-04T12:00:00Z,Node,aks-node-123,,,aks-node-123,,,,...
```

#### Dependencies

- Python 3.7+
- Azure CLI (`az`) installed and authenticated
- Container Insights enabled on AKS cluster (if using cluster discovery)
- Standard library modules only

#### Required Permissions

- Reader role on the AKS cluster resource (if using cluster discovery)
- Log Analytics Reader role on the workspace

---

### get_ala_thor_timeline.py

Extract Thor workunit timeline from Azure Log Analytics audit logs.
Expand Down Expand Up @@ -749,6 +848,146 @@ Read storage usage CSV data (from `get_storage_usage.py` or compatible source) a

---

### analyze_pod_node_inventory.py

Analyze pod and node inventory data with HPCC component identification.

#### Purpose

Analyze the CSV output from `get_pod_node_inventory.py` to identify HPCC components, cross-reference pods to nodes, and calculate resource consumption per component over time. This tool provides insights into which HPCC components are consuming which resources.

#### When to Use

- Understanding HPCC component resource consumption patterns
- Identifying which nodes are running which components
- Calculating pod-hours and duration for cost attribution
- Analyzing component deployment patterns
- Generating component-level resource usage reports

#### Features

- Identifies HPCC components from pod naming conventions
- dali, esp, eclccserver, sasha, dfuserver, eclagent
- Thor clusters (manager and worker pods)
- Roxie clusters
- Cross-references pods to nodes (computers)
- Calculates pod count and node count per component
- Estimates resource consumption duration (pod-hours)
- Multiple output formats (CSV, text)
- Detailed component breakdown option (--by-component)
- Time range filtering

#### Component Identification

The tool identifies HPCC components based on pod naming conventions:
- `hpcc-dali-*` → dali
- `hpcc-esp-*` → esp
- `hpcc-thor-<cluster>-thormanager-*` → thor-<cluster>
- `hpcc-thor-<cluster>-thorworker-*` → thor-<cluster>-worker
- `hpcc-roxie-<cluster>-*` → roxie-<cluster>
- Other standard HPCC components

#### Usage

```bash
# Basic CSV analysis
cat inventory.csv | ./analyze_pod_node_inventory.py

# From file with CSV output
./analyze_pod_node_inventory.py inventory.csv

# Human-readable text report
./analyze_pod_node_inventory.py inventory.csv --format text

# Detailed component breakdown
./analyze_pod_node_inventory.py inventory.csv --format text --by-component

# Time-filtered analysis
./analyze_pod_node_inventory.py inventory.csv \
--start-time "2025-11-04 12:00" --end-time "2025-11-04 18:00"

# Pipeline from collector
./get_pod_node_inventory.py --workspace-id <workspace-id> -n hpcc \
--start-time "2025-11-04 12:00" | ./analyze_pod_node_inventory.py
```

#### Command-Line Options

**Positional Arguments:**
- `input` - Input CSV file from get_pod_node_inventory.py (or read from stdin if omitted)

**Optional Arguments:**
- `--start-time DATETIME` - Start time filter (YYYY-MM-DD or YYYY-MM-DD HH:MM)
- `--end-time DATETIME` - End time filter (YYYY-MM-DD or YYYY-MM-DD HH:MM)
- `--format {csv,text}` - Output format (default: csv)
- `--by-component` - Show detailed breakdown by component (text format only)

#### Output Formats

**CSV (default):**
```csv
# Generated by: analyze_pod_node_inventory.py
# Date generated: 2025-12-12 09:00:00
# Time range: 2025-11-04 12:00:00 to 2025-11-04 14:00:00
# Total pods: 42
# Total nodes: 10
#
Component,PodCount,NodeCount,DurationHours,PodHours
dali,1,1,2.00,2.00
esp,3,3,2.00,6.00
thor-mycluster,1,1,2.00,2.00
thor-mycluster-worker,8,8,2.00,16.00
roxie-cluster1,4,4,2.00,8.00
```

**Text:**
```
================================================================================
POD AND NODE INVENTORY ANALYSIS
================================================================================

SUMMARY
--------------------------------------------------------------------------------
Time Range: 2025-11-04 12:00:00 to 2025-11-04 14:00:00
Duration: 2.00 hours
Total Pods: 42
Total Nodes: 10

COMPONENT BREAKDOWN
--------------------------------------------------------------------------------
Component Pods Nodes Duration Pod-Hours
--------------------------------------------------------------------------------
dali 1 1 2.00h 2.00h
esp 3 3 2.00h 6.00h
thor-mycluster 1 1 2.00h 2.00h
thor-mycluster-worker 8 8 2.00h 16.00h
roxie-cluster1 4 4 2.00h 8.00h

NODE UTILIZATION
--------------------------------------------------------------------------------
Node Name Pod Count
--------------------------------------------------------------------------------
aks-nodepool1-12345 5
aks-nodepool1-12346 4
...
```

**Text with --by-component:**
Includes detailed pod-to-node mapping for each component.

#### Important Notes

- **Duration Calculation:** The tool estimates durations based on snapshot data from KubePodInventory. It assumes all pods in the snapshot were running for the entire time window. For more accurate pod lifecycle tracking, time-series data would be needed.
- **Component Identification:** Based on standard HPCC pod naming conventions. Non-HPCC pods are categorized as "Other".

#### Dependencies

- Python 3.7+
- Standard library modules only (csv, datetime, collections)
- No Azure CLI required (works offline on saved CSV files)

---

### analyze_thor_timeline.py

Thor cluster utilization analysis and cost modeling from timeline data.
Expand Down
Loading