diff --git a/skills/k8s-node-executor/SKILL.md b/skills/k8s-node-executor/SKILL.md new file mode 100644 index 00000000..4cc9c088 --- /dev/null +++ b/skills/k8s-node-executor/SKILL.md @@ -0,0 +1,160 @@ +--- +name: k8s-node-executor +description: Execute commands on Bottlerocket K8s nodes via kubectl debug +--- + +# Skill: K8s Node Executor + +## Purpose + +Execute commands directly on Bottlerocket nodes for debugging, testing, and exploration. + +## When to Use + +- Debugging node-level issues on Bottlerocket K8s nodes +- Inspecting host filesystem, processes, or network +- Running apiclient commands to view/modify Bottlerocket settings +- Container runtime inspection + +## Prerequisites + +- kubectl access to the K8s cluster with Bottlerocket nodes +- Target node name (get via `kubectl get nodes`) + +## Procedure + +Use `kubectl debug` with `--profile=sysadmin` for full host access including the Bottlerocket API socket. + +### Execute Commands + +```bash +# Single command +kubectl debug node/ -it --image=busybox --profile=sysadmin -- + +# Interactive shell +kubectl debug node/ -it --image=busybox --profile=sysadmin -- /bin/sh +``` + +**Note:** The `--profile=sysadmin` flag is required. +Without it, apiclient commands fail with "Permission denied" on the API socket. + +### Cleanup + +Debug pods are automatically cleaned up when the session ends. +To manually remove: + +```bash +kubectl get pods -o name | grep node-debugger | xargs kubectl delete +``` + +## Common Commands + +Replace `` with your node name. + +### Bottlerocket Settings (apiclient) + +```bash +# View all settings +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get settings + +# View specific setting +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get settings.kubernetes + +# View OS info +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get os + +# Modify setting +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient set motd="Debug session" +``` + +### Host Filesystem + +```bash +# OS release +kubectl debug node/ -it --image=busybox --profile=sysadmin -- cat /host/etc/os-release + +# Bottlerocket settings JSON +kubectl debug node/ -it --image=busybox --profile=sysadmin -- cat /host/etc/bottlerocket/settings.json + +# List host binaries +kubectl debug node/ -it --image=busybox --profile=sysadmin -- ls /host/usr/bin/ +``` + +### System Info + +```bash +# Kernel version +kubectl debug node/ -it --image=busybox --profile=sysadmin -- uname -a + +# Memory +kubectl debug node/ -it --image=busybox --profile=sysadmin -- free -h + +# Disk +kubectl debug node/ -it --image=busybox --profile=sysadmin -- df -h + +# Processes +kubectl debug node/ -it --image=busybox --profile=sysadmin -- ps aux +``` + +### Networking + +```bash +# Interfaces +kubectl debug node/ -it --image=busybox --profile=sysadmin -- ip addr + +# Routes +kubectl debug node/ -it --image=busybox --profile=sysadmin -- ip route + +# Listening ports +kubectl debug node/ -it --image=busybox --profile=sysadmin -- ss -tlnp +``` + +### Container Runtime + +```bash +# List containers (k8s namespace) +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host ctr -n k8s.io containers list + +# List images +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host ctr -n k8s.io images list +``` + +### Systemd Services + +```bash +# List services +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host systemctl list-units --type=service + +# Service status +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host systemctl status kubelet +``` + +## Security Warning + +**This approach grants full node access.** It can: +- Read/modify any host file +- Access all processes and containers +- Change system configuration +- Affect node stability + +**Best practices:** +- Use only in dev/test environments +- Clean up immediately after use + +## Troubleshooting + +### Permission denied on API socket + +Ensure you're using `--profile=sysadmin`. +The default profile doesn't grant socket access. + +### Command not found + +Host binaries need `chroot /host` prefix: +```bash +# Wrong +kubectl debug node/ -it --image=busybox --profile=sysadmin -- apiclient get os + +# Right +kubectl debug node/ -it --image=busybox --profile=sysadmin -- chroot /host /usr/bin/apiclient get os +``` diff --git a/skills/ssm-executor/SKILL.md b/skills/ssm-executor/SKILL.md new file mode 100644 index 00000000..962122cb --- /dev/null +++ b/skills/ssm-executor/SKILL.md @@ -0,0 +1,126 @@ +--- +name: ssm-executor +description: Execute commands on Bottlerocket EC2 instances via AWS Systems Manager +--- + +# SSM Executor + +Execute commands on Bottlerocket EC2 instances using AWS Systems Manager (SSM), with access to both the control container and host system. + +## When to Use + +- Debugging Bottlerocket instances (ECS, K8s, or standalone) +- Checking system state, logs, or configuration +- Running diagnostic commands +- When kubectl exec is not available or insufficient + +## Prerequisites + +- AWS credentials with SSM permissions +- Instance has SSM agent running (enabled by default in Bottlerocket) +- Instance has IAM role with `AmazonSSMManagedInstanceCore` policy +- Network path to SSM endpoints (internet or VPC endpoints) + +## Procedure + +### 1. Verify SSM Connectivity + +```bash +./scripts/verify-connectivity.sh INSTANCE_ID REGION +``` + +Expected: `Online` status and `Bottlerocket` platform. + +### 2. Execute Commands + +**Simple command (control container context):** +```bash +./scripts/control-container-command.sh INSTANCE_ID REGION "uname -a" +``` + +**Access host rootfs via sheltie (full host access):** +```bash +./scripts/sheltie-command.sh INSTANCE_ID REGION "containerd --version" +``` + +### 3. Understanding the Execution Context + +SSM commands run through a chain of contexts: + +``` +SSM → Control Container → (optional) Admin Container → Sheltie → Host +``` + +- **Control container**: Limited environment, has `apiclient` +- **Admin container**: Interactive shell, accessed via `apiclient exec admin bash` +- **Sheltie**: Direct host access via `apiclient exec admin sheltie -- ` + +## Common Commands + +### Bottlerocket Settings (control container) + +```bash +./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient get settings.kubernetes" +./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient set motd='Debug session'" +./scripts/control-container-command.sh INSTANCE_ID REGION "apiclient get os" +``` + +### Host Binaries (via sheltie) + +```bash +./scripts/sheltie-command.sh INSTANCE_ID REGION "containerd --version" +./scripts/sheltie-command.sh INSTANCE_ID REGION "kubelet --version" +./scripts/sheltie-command.sh INSTANCE_ID REGION "systemctl list-units --type=service" +./scripts/sheltie-command.sh INSTANCE_ID REGION "systemctl status containerd" +``` + +### Filesystem Inspection + +```bash +./scripts/sheltie-command.sh INSTANCE_ID REGION "cat /etc/os-release" +./scripts/sheltie-command.sh INSTANCE_ID REGION "df -h" +./scripts/sheltie-command.sh INSTANCE_ID REGION "free -h" +``` + +### Networking + +```bash +./scripts/sheltie-command.sh INSTANCE_ID REGION "ip addr" +./scripts/sheltie-command.sh INSTANCE_ID REGION "ip route" +./scripts/sheltie-command.sh INSTANCE_ID REGION "ss -tlnp" +``` + +## Comparison with k8s-node-executor + +| Feature | ssm-executor | k8s-node-executor | +|---------|--------------|-------------------| +| Works with | Any EC2 instance | K8s nodes only | +| Requires | SSM connectivity | kubectl access | +| Access level | Full host via sheltie | Host namespaces via pod | +| Best for | ECS, standalone, early boot | K8s-specific debugging | + +## Validation + +- [ ] Instance shows `Online` in SSM +- [ ] Control container commands execute +- [ ] Sheltie commands access host + +## Common Issues + +**Instance not showing in SSM:** +- Check IAM role has SSM permissions +- Verify network path to SSM endpoints +- Instance may need reboot after IAM role attachment + +**Command timeout:** +- Increase timeout in send-command +- Check instance is not overloaded + +**Permission denied:** +- Some commands require sheltie for host access +- Check if admin container is enabled + +## Reference + +- [Bottlerocket Admin Container](https://github.com/bottlerocket-os/bottlerocket#admin-container) +- [AWS SSM Run Command](https://docs.aws.amazon.com/systems-manager/latest/userguide/execute-remote-commands.html) diff --git a/skills/ssm-executor/scripts/control-container-command.sh b/skills/ssm-executor/scripts/control-container-command.sh new file mode 100755 index 00000000..a5b5b563 --- /dev/null +++ b/skills/ssm-executor/scripts/control-container-command.sh @@ -0,0 +1,20 @@ +#!/bin/bash +set -euo pipefail +INSTANCE_ID="${1:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}" +REGION="${2:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}" +COMMAND="${3:?Usage: control-container-command.sh INSTANCE_ID REGION COMMAND}" + +CMD_ID=$(aws ssm send-command \ + --instance-ids "$INSTANCE_ID" \ + --document-name "AWS-RunShellScript" \ + --parameters "{\"commands\":[\"$COMMAND\"]}" \ + --region "$REGION" \ + --query 'Command.CommandId' \ + --output text) +sleep 3 +aws ssm get-command-invocation \ + --command-id "$CMD_ID" \ + --instance-id "$INSTANCE_ID" \ + --region "$REGION" \ + --query 'StandardOutputContent' \ + --output text diff --git a/skills/ssm-executor/scripts/sheltie-command.sh b/skills/ssm-executor/scripts/sheltie-command.sh new file mode 100755 index 00000000..490f9748 --- /dev/null +++ b/skills/ssm-executor/scripts/sheltie-command.sh @@ -0,0 +1,20 @@ +#!/bin/bash +set -euo pipefail +INSTANCE_ID="${1:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}" +REGION="${2:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}" +COMMAND="${3:?Usage: sheltie-command.sh INSTANCE_ID REGION COMMAND}" + +CMD_ID=$(aws ssm send-command \ + --instance-ids "$INSTANCE_ID" \ + --document-name "AWS-RunShellScript" \ + --parameters "{\"commands\":[\"apiclient exec admin sheltie -- $COMMAND\"]}" \ + --region "$REGION" \ + --query 'Command.CommandId' \ + --output text) +sleep 4 +aws ssm get-command-invocation \ + --command-id "$CMD_ID" \ + --instance-id "$INSTANCE_ID" \ + --region "$REGION" \ + --query 'StandardOutputContent' \ + --output text diff --git a/skills/ssm-executor/scripts/verify-connectivity.sh b/skills/ssm-executor/scripts/verify-connectivity.sh new file mode 100755 index 00000000..e32eb6c4 --- /dev/null +++ b/skills/ssm-executor/scripts/verify-connectivity.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -euo pipefail +INSTANCE_ID="${1:?Usage: verify-connectivity.sh INSTANCE_ID REGION}" +REGION="${2:?Usage: verify-connectivity.sh INSTANCE_ID REGION}" + +aws ssm describe-instance-information \ + --filters "Key=InstanceIds,Values=$INSTANCE_ID" \ + --query 'InstanceInformationList[*].[InstanceId,PingStatus,PlatformName]' \ + --output table --region "$REGION"