Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions skills/k8s-node-executor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
name: k8s-node-executor
description: Execute commands on Bottlerocket K8s nodes via kubectl debug
---

# Skill: K8s Node Executor

## Purpose

Execute commands directly on Bottlerocket nodes for debugging, testing, and exploration.

## When to Use

- Debugging node-level issues on Bottlerocket K8s nodes
- Inspecting host filesystem, processes, or network
- Running apiclient commands to view/modify Bottlerocket settings
- Container runtime inspection

## Prerequisites

- kubectl access to the K8s cluster with Bottlerocket nodes
- Target node name (get via `kubectl get nodes`)

## Procedure

Use `kubectl debug` with `--profile=sysadmin` for full host access including the Bottlerocket API socket.

### Execute Commands

```bash
# Single command
kubectl debug node/<node-name> -it --image=busybox --profile=sysadmin -- <command>

# Interactive shell
kubectl debug node/<node-name> -it --image=busybox --profile=sysadmin -- /bin/sh
```

**Note:** The `--profile=sysadmin` flag is required.
Without it, apiclient commands fail with "Permission denied" on the API socket.

### Cleanup

Debug pods are automatically cleaned up when the session ends.
To manually remove:

```bash
kubectl get pods -o name | grep node-debugger | xargs kubectl delete
```

## Common Commands

Replace `<node>` with your node name.

### Bottlerocket Settings (apiclient)

```bash
# View all settings
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get settings

# View specific setting
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get settings.kubernetes

# View OS info
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get os

# Modify setting
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient set motd="Debug session"
```

### Host Filesystem

```bash
# OS release
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- cat /host/etc/os-release

# Bottlerocket settings JSON
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- cat /host/etc/bottlerocket/settings.json

# List host binaries
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- ls /host/usr/bin/
```

### System Info

```bash
# Kernel version
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- uname -a

# Memory
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- free -h

# Disk
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- df -h

# Processes
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- ps aux
```

### Networking

```bash
# Interfaces
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- ip addr

# Routes
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- ip route

# Listening ports
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- ss -tlnp
```

### Container Runtime

```bash
# List containers (k8s namespace)
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host ctr -n k8s.io containers list

# List images
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host ctr -n k8s.io images list
```

### Systemd Services

```bash
# List services
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host systemctl list-units --type=service

# Service status
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host systemctl status kubelet
```

## Security Warning

**This approach grants full node access.** It can:
- Read/modify any host file
- Access all processes and containers
- Change system configuration
- Affect node stability

**Best practices:**
- Use only in dev/test environments
- Clean up immediately after use

## Troubleshooting

### Permission denied on API socket

Ensure you're using `--profile=sysadmin`.
The default profile doesn't grant socket access.

### Command not found

Host binaries need `chroot /host` prefix:
```bash
# Wrong
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- apiclient get os

# Right
kubectl debug node/<node> -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get os
```
126 changes: 126 additions & 0 deletions skills/ssm-executor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
name: ssm-executor
description: Execute commands on Bottlerocket EC2 instances via AWS Systems Manager
---

# SSM Executor

Execute commands on Bottlerocket EC2 instances using AWS Systems Manager (SSM), with access to both the control container and host system.

## When to Use

- Debugging Bottlerocket instances (ECS, K8s, or standalone)
- Checking system state, logs, or configuration
- Running diagnostic commands
- When kubectl exec is not available or insufficient

## Prerequisites

- AWS credentials with SSM permissions
- Instance has SSM agent running (enabled by default in Bottlerocket)
- Instance has IAM role with `AmazonSSMManagedInstanceCore` policy
- Network path to SSM endpoints (internet or VPC endpoints)

## Procedure

### 1. Verify SSM Connectivity

```bash
./scripts/verify-connectivity.sh INSTANCE_ID REGION
```

Expected: `Online` status and `Bottlerocket` platform.

### 2. Execute Commands

**Simple command (control container context):**
```bash
./scripts/control-container-command.sh INSTANCE_ID REGION "uname -a"
```

**Access host rootfs via sheltie (full host access):**
```bash
./scripts/sheltie-command.sh INSTANCE_ID REGION "containerd --version"
```

### 3. Understanding the Execution Context

SSM commands run through a chain of contexts:

```
SSM → Control Container → (optional) Admin Container → Sheltie → Host
```

- **Control container**: Limited environment, has `apiclient`
- **Admin container**: Interactive shell, accessed via `apiclient exec admin bash`
- **Sheltie**: Direct host access via `apiclient exec admin sheltie -- <cmd>`

## Common Commands

### Bottlerocket Settings (control container)

```bash
./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient get settings.kubernetes"
./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient set motd='Debug session'"
./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient get os"
```

### Host Binaries (via sheltie)

```bash
./scripts/sheltie-command.sh INSTANCE_ID REGION "containerd --version"
./scripts/sheltie-command.sh INSTANCE_ID REGION "kubelet --version"
./scripts/sheltie-command.sh INSTANCE_ID REGION "systemctl list-units --type=service"
./scripts/sheltie-command.sh INSTANCE_ID REGION "systemctl status containerd"
```

### Filesystem Inspection

```bash
./scripts/sheltie-command.sh INSTANCE_ID REGION "cat /etc/os-release"
./scripts/sheltie-command.sh INSTANCE_ID REGION "df -h"
./scripts/sheltie-command.sh INSTANCE_ID REGION "free -h"
```

### Networking

```bash
./scripts/sheltie-command.sh INSTANCE_ID REGION "ip addr"
./scripts/sheltie-command.sh INSTANCE_ID REGION "ip route"
./scripts/sheltie-command.sh INSTANCE_ID REGION "ss -tlnp"
```

## Comparison with k8s-node-executor

| Feature | ssm-executor | k8s-node-executor |
|---------|--------------|-------------------|
| Works with | Any EC2 instance | K8s nodes only |
| Requires | SSM connectivity | kubectl access |
| Access level | Full host via sheltie | Host namespaces via pod |
| Best for | ECS, standalone, early boot | K8s-specific debugging |

## Validation

- [ ] Instance shows `Online` in SSM
- [ ] Control container commands execute
- [ ] Sheltie commands access host

## Common Issues

**Instance not showing in SSM:**
- Check IAM role has SSM permissions
- Verify network path to SSM endpoints
- Instance may need reboot after IAM role attachment

**Command timeout:**
- Increase timeout in send-command
- Check instance is not overloaded

**Permission denied:**
- Some commands require sheltie for host access
- Check if admin container is enabled

## Reference

- [Bottlerocket Admin Container](https://github.com/bottlerocket-os/bottlerocket#admin-container)
- [AWS SSM Run Command](https://docs.aws.amazon.com/systems-manager/latest/userguide/execute-remote-commands.html)
20 changes: 20 additions & 0 deletions skills/ssm-executor/scripts/control-container-command.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
set -euo pipefail
INSTANCE_ID="${1:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}"
REGION="${2:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}"
COMMAND="${3:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}"

CMD_ID=$(aws ssm send-command \
--instance-ids "$INSTANCE_ID" \
--document-name "AWS-RunShellScript" \
--parameters "{\"commands\":[\"$COMMAND\"]}" \
--region "$REGION" \
--query 'Command.CommandId' \
--output text)
sleep 3
aws ssm get-command-invocation \
--command-id "$CMD_ID" \
--instance-id "$INSTANCE_ID" \
--region "$REGION" \
--query 'StandardOutputContent' \
--output text
20 changes: 20 additions & 0 deletions skills/ssm-executor/scripts/sheltie-command.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
set -euo pipefail
INSTANCE_ID="${1:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}"
REGION="${2:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}"
COMMAND="${3:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}"

CMD_ID=$(aws ssm send-command \
--instance-ids "$INSTANCE_ID" \
--document-name "AWS-RunShellScript" \
--parameters "{\"commands\":[\"apiclient exec admin sheltie -- $COMMAND\"]}" \
--region "$REGION" \
--query 'Command.CommandId' \
--output text)
sleep 4
aws ssm get-command-invocation \
--command-id "$CMD_ID" \
--instance-id "$INSTANCE_ID" \
--region "$REGION" \
--query 'StandardOutputContent' \
--output text
9 changes: 9 additions & 0 deletions skills/ssm-executor/scripts/verify-connectivity.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
set -euo pipefail
INSTANCE_ID="${1:?Usage: verify-connectivity.sh INSTANCE_ID REGION}"
REGION="${2:?Usage: verify-connectivity.sh INSTANCE_ID REGION}"

aws ssm describe-instance-information \
--filters "Key=InstanceIds,Values=$INSTANCE_ID" \
--query 'InstanceInformationList[*].[InstanceId,PingStatus,PlatformName]' \
--output table --region "$REGION"