Skip to content

Latest commit

 

History

History
460 lines (366 loc) · 11.9 KB

File metadata and controls

460 lines (366 loc) · 11.9 KB

AI-Generated Bash Script Review: Web3 Node Setup & Health Check

Executive Summary

This script automates Ethereum execution client provisioning on Ubuntu but contains critical production defects spanning error handling, idempotency, security, Web3 protocol compliance, and observability. Below is a systematic analysis with corrected implementations.


Critical Issues Breakdown

1. Missing Strict Mode ⚠️ CRITICAL

Problem:

#!/bin/bash  # No error handling directives

The script lacks set -euo pipefail, meaning:

  • Failed apt installs are silently ignored
  • Docker pull failures don't halt execution
  • Unset variables ($RPC_URL may be empty) pass through
  • Subsequent commands run against corrupted state

Impact: Script may report success while core dependencies are missing, leading to silent health-check failures in production.

Correction:

#!/bin/bash
set -euo pipefail
IFS=$'\n\t'

Rationale:

  • set -e: Exit on first error
  • set -u: Treat unset variables as errors
  • set -o pipefail: Pipe failures propagate
  • IFS prevents word-splitting in loops

2. Unsafe sudo Usage & Missing Privilege Checks ⚠️ CRITICAL

Problem:

sudo apt update
sudo apt upgrade -y

The script assumes:

  • Passwordless sudo is configured
  • User running it has sudo privileges
  • Running as non-root is acceptable

Impact: Script fails without clear error; may corrupt system if misused in restricted environments.

Correction:

if [[ $EUID -ne 0 ]]; then
  echo "ERROR: This script must run as root or with full sudo access" >&2
  exit 2
fi

# Verify sudo is properly configured if called via sudo
if [[ -n "${SUDO_USER:-}" ]]; then
  echo "Detected sudo execution. Proceeding as root."
fi

Rationale: Explicit privilege validation prevents silent failures and clarifies intent.


3. Non-Idempotent Docker Operations ⚠️ CRITICAL

Problem:

docker run -d --name $NODE_NAME -p 8545:8545 ethereum/client-go:stable ...

Running this twice fails because the container eth-node already exists.

Impact: Re-running the script for updates/repairs fails; CI/CD pipelines cannot safely retry.

Correction:

# Stop and remove existing container
if docker ps -a --format '{{.Names}}' | grep -q "^${NODE_NAME}$"; then
  echo "Removing existing container: $NODE_NAME"
  docker stop "$NODE_NAME" 2>/dev/null || true
  docker rm "$NODE_NAME" 2>/dev/null || true
fi

# Verify image is up to date
docker pull ethereum/client-go:stable

# Run with improved configuration
docker run \
  -d \
  --name "$NODE_NAME" \
  --restart unless-stopped \
  -p 127.0.0.1:8545:8545 \
  --log-driver json-file \
  --log-opt max-size=10m \
  --log-opt max-file=3 \
  ethereum/client-go:stable \
    --http \
    --http.addr 127.0.0.1 \
    --http.api eth,net,web3,admin \
    --syncmode snap

Key Improvements:

  • Idempotent cleanup
  • RPC bound to localhost (security)
  • Auto-restart policy
  • Log rotation
  • Explicit sync mode

4. Unsafe apt Package Management ⚠️ HIGH

Problem:

sudo apt update
sudo apt upgrade -y
  • apt upgrade may break system libraries or kernel compatibility
  • No DEBIAN_FRONTEND=noninteractive in non-interactive context
  • No verification that upgrades completed

Impact: Uncontrolled system updates can render nodes inoperable or break dependent services.

Correction:

export DEBIAN_FRONTEND=noninteractive

# Update package lists
apt-get update -qq

# Install only required packages (no full upgrade)
apt-get install -y \
  curl \
  jq \
  docker.io \
  ca-certificates

# Optional: Apply security patches only (safer)
apt-get upgrade -y -o DPkg::options::="--force-confnew" || {
  echo "WARNING: Package upgrade failed, but continuing with installation" >&2
}

5. Missing Dependency Validation ⚠️ MEDIUM

Problem:

sudo apt install -y curl jq docker.io
# Script continues without verifying successful installation

Impact: If jq installation fails, the RPC health-check JSON parsing breaks silently.

Correction:

# Helper function to validate commands
command_exists() {
  command -v "$1" &>/dev/null
}

# Verify critical dependencies post-install
for cmd in curl jq docker; do
  if ! command_exists "$cmd"; then
    echo "ERROR: Required command '$cmd' not found after installation" >&2
    exit 1
  fi
done

# Verify Docker daemon is running
if ! docker ps &>/dev/null; then
  echo "ERROR: Docker daemon is not running or socket not accessible" >&2
  exit 1
fi

6. RPC Health Check Protocol Bugs ⚠️ CRITICAL

Problem:

RPC_URL=$1
# $RPC_URL used unquoted and unvalidated

CHAIN_ID=$(curl $RPC_URL -X POST ... | jq -r '.result')

if [ "$CHAIN_ID" != "$EXPECTED_CHAIN_ID" ]; then
  # Compares "0x1" (hex string) != 1 (decimal)
  echo "Wrong network detected"
  exit 1
fi

Multiple failures:

  1. $RPC_URL unquoted → breaks with spaces
  2. No URL validation (http/https check)
  3. No curl timeout (can hang indefinitely)
  4. No error handling if curl fails
  5. Critical: Chain ID comparison is type-unsafe (hex vs decimal)
    • eth_chainId returns "0x1" (hex string)
    • Script compares against 1 (decimal integer)
    • This check always fails on mainnet

Impact: Health check always fails, blocking production deployment.

Correction:

# Validate RPC URL format
validate_rpc_url() {
  local url="$1"
  if [[ ! "$url" =~ ^https?:// ]]; then
    echo "ERROR: Invalid RPC URL format. Must start with http:// or https://" >&2
    return 1
  fi
}

# Convert hex chain ID to decimal for comparison
hex_to_decimal() {
  printf "%d\n" "$1"
}

# Robust RPC health check
check_rpc_health() {
  local rpc_url="$1"
  local expected_chain_id="$2"
  local timeout=10
  
  validate_rpc_url "$rpc_url" || return 1
  
  # Fetch block number
  local block_response
  block_response=$(curl \
    --silent \
    --max-time "$timeout" \
    --fail \
    -X POST \
    -H "Content-Type: application/json" \
    "$rpc_url" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
    2>/dev/null) || {
    echo "ERROR: RPC health check failed (block number request)" >&2
    return 1
  }
  
  local block=$(echo "$block_response" | jq -r '.result // .error.message' 2>/dev/null)
  if [[ -z "$block" ]] || [[ "$block" == "null" ]]; then
    echo "ERROR: No block number returned from RPC" >&2
    return 1
  fi
  
  # Fetch chain ID
  local chain_response
  chain_response=$(curl \
    --silent \
    --max-time "$timeout" \
    --fail \
    -X POST \
    -H "Content-Type: application/json" \
    "$rpc_url" \
    -d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
    2>/dev/null) || {
    echo "ERROR: RPC health check failed (chain ID request)" >&2
    return 1
  }
  
  local chain_id_hex=$(echo "$chain_response" | jq -r '.result // .error.message' 2>/dev/null)
  
  # Convert hex response to decimal for safe comparison
  local chain_id_dec
  chain_id_dec=$(hex_to_decimal "$chain_id_hex") || {
    echo "ERROR: Invalid chain ID format: $chain_id_hex" >&2
    return 1
  }
  
  if [[ "$chain_id_dec" != "$expected_chain_id" ]]; then
    echo "ERROR: Wrong network detected. Expected chain ID $expected_chain_id, got $chain_id_dec" >&2
    return 1
  fi
  
  echo "✓ RPC Health: Block $block on chain $chain_id_dec"
  return 0
}

7. Poor Observability & Logging ⚠️ MEDIUM

Problem:

echo "Starting setup for $NODE_NAME"
sleep 10
echo "Latest block: $BLOCK"
echo "Node setup and health check completed successfully"

Issues:

  • No timestamps
  • No structured logging for parsing/alerting
  • No exit codes conveyed to caller
  • No log level severity (info vs error)
  • Hard-coded 10-second sleep (arbitrary)

Impact: Debugging failures is difficult; CI/CD pipelines cannot easily detect partial success.

Correction:

# Structured logging with timestamps and severity
log() {
  local level="$1"
  shift
  local message="$*"
  local timestamp
  timestamp=$(date -u +'%Y-%m-%dT%H:%M:%SZ')
  
  case "$level" in
    INFO)  echo "[${timestamp}] [INFO] $message" ;;
    WARN)  echo "[${timestamp}] [WARN] $message" >&2 ;;
    ERROR) echo "[${timestamp}] [ERROR] $message" >&2 ;;
  esac
}

log INFO "Starting setup for $NODE_NAME"
log INFO "Installing dependencies..."
apt-get install -y ... || {
  log ERROR "Package installation failed"
  exit 1
}
log INFO "Pulling Docker image..."
docker pull ... || {
  log ERROR "Docker pull failed"
  exit 1
}

8. Security: RPC Exposure & Hardening ⚠️ HIGH

Problem:

docker run -d --name $NODE_NAME -p 8545:8545 ethereum/client-go:stable \
  --http --http.addr 0.0.0.0 --http.api eth,net,web3

Security Issues:

  • RPC exposed on 0.0.0.0 (all interfaces) → accessible from internet
  • No authentication (eth, net, web3 APIs are permissive)
  • No rate limiting or DOS protection
  • HTTP (not HTTPS) acceptable only for localhost

Impact: Node is vulnerable to state-reading attacks, RPC spam, and potential compromise.

Correction:

# Bind RPC to localhost only
docker run -d --name "$NODE_NAME" \
  -p 127.0.0.1:8545:8545 \
  ethereum/client-go:stable \
    --http \
    --http.addr 127.0.0.1 \
    --http.api eth,net,web3 \
    --http.vhosts localhost,127.0.0.1

# For production remote access, use reverse proxy with auth
# (Example: nginx with mutual TLS, API key auth, or WAF)

9. Error Handling & Exit Codes ⚠️ MEDIUM

Problem:

if [ "$CHAIN_ID" != "$EXPECTED_CHAIN_ID" ]; then
  echo "Wrong network detected"
  exit 1
fi

echo "Node setup and health check completed successfully"

Issues:

  • Exit code 1 is generic (indistinguishable from other failures)
  • No intermediate checkpoints; entire script failure opaque
  • Success message printed even if health check was skipped

Correction:

# Define clear exit codes
readonly EXIT_OK=0
readonly EXIT_GENERIC_ERROR=1
readonly EXIT_INVALID_ARGS=2
readonly EXIT_PRIV_ERROR=3
readonly EXIT_HEALTH_CHECK_FAILED=4
readonly EXIT_NETWORK_MISMATCH=5

trap 'log ERROR "Setup failed with exit code $?"' EXIT

# Use explicit exit codes
if [[ "$CHAIN_ID" -ne "$EXPECTED_CHAIN_ID" ]]; then
  log ERROR "Wrong network: expected $EXPECTED_CHAIN_ID, got $CHAIN_ID"
  exit $EXIT_NETWORK_MISMATCH
fi

log INFO "All checks passed"
exit $EXIT_OK

Summary Table

Issue Severity Category Fix Complexity
Missing set -euo pipefail CRITICAL Reliability Low
Unsafe sudo usage CRITICAL Security Low
Non-idempotent Docker CRITICAL Reliability Medium
Chain ID hex/decimal bug CRITICAL Logic Low
apt upgrade side effects HIGH Stability Low
RPC exposed to 0.0.0.0 HIGH Security Low
No dependency validation MEDIUM Reliability Medium
Poor logging MEDIUM Observability Medium
No exit code semantics MEDIUM Reliability Low

Production-Ready Best Practices

  1. Always use set -euo pipefail with proper variable handling
  2. Validate all inputs before use (RPC URLs, environment checks)
  3. Design for idempotency — scripts should be safely re-runnable
  4. Separate concerns — split setup, validation, and health-check into functions
  5. Implement structured logging with timestamps and severity levels
  6. Test Docker operations — ensure pull, run, and health checks work in isolation
  7. Security by default — bind services to localhost, use minimal API scope
  8. Define exit codes clearly for CI/CD integration
  9. Document assumptions (OS, privileges, dependencies) explicitly

References