Swarm integration #93

vasilykraev · 2025-10-15T08:50:39Z

Summary by CodeRabbit

New Features
- Introduces full Swarm platform (swarm-db, swarm-cloud, swarm-node) with service orchestration, plus new services: MongoDB, NATS, test-app, and routing.
- Adds service launchers, runtime service installers, and a kubectl availability wrapper for improved UX.
- Adds a services downloader and staged Swarm service installer.
Documentation
- Expanded README and new README_SWARM with local build and PKI image access guidance.
Chores
- CI now authenticates to container registry for builds and adds Git submodules.
- Initramfs and boot updated with additional filesystem utilities and encrypted state/overlay support.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… related to the check-config-files service, streamlining the project and reducing complexity.

* Add NATS service implementation: Introduce main.py for NATS service logic, manifest.yaml for service configuration, and setup script for SwarmDB integration. This commit establishes the foundational components for managing NATS within the system, including installation, configuration, and service management. * Add NATS server installation and configuration: Introduce install_nats.sh script for installing NATS server version 2.12.2, including user/group setup and systemd service configuration. Update Dockerfile to integrate NATS installation and disable autostart during build. Modify swarm setup scripts to include --omit-command-init option for ClusterServices creation. * Refactor NATS configuration generation: Update write_nats_config function to enhance cluster configuration handling. Introduce conditional clustering based on available routes and improve jetstream configuration formatting. * Add ownership and permission management for NATS directories: Introduce helper functions to ensure that the NATS data and log directories are owned by the 'nats' user and group, enhancing security and functionality for JetStream operations. * Add NATS JetStream synchronization test script: Introduce nats_js_sync_test.py for publishing and reading messages across NATS endpoints, ensuring stream existence and handling message acknowledgment. This script enhances testing capabilities for NATS JetStream operations. * Add WireGuard and Geo-IP Measurement setup scripts: Introduce 10.setup-wireguard.sh and 40.setup-geo-ip-measurement.sh for bootstrapping respective services into SwarmDB via MySQL client. These scripts ensure proper service configuration and manifest validation, enhancing service management within the system. * Enhance NATS JetStream sync test script: Add custom sender ID argument and update message payload structure to include sender information, host, timestamp, process ID, and message content. Improve message output formatting for clearer attribution during testing. * Add NATS JetStream synchronization test documentation: Introduce nats_js_sync_test.md to provide example usage for running the nats_js_sync_test.py script with specific configurations for two nodes. This enhances the clarity and usability of the testing process.

…e build process while retaining necessary patch import for SuperProtocol.

…le serial console access, enhancing security and streamlining the container environment.

…tarts only after WireGuard tunnels and network are up by adding an override configuration file. This enhances service management and dependency handling within the container environment.

…n file for NATS server and update the installation script to ensure NATS starts only after WireGuard tunnels and network are up. This change simplifies service management within the Docker environment.

* Add signature for mrenclave-ae6a4d0cc13ddcba76e4758c068e1366028eb270cad4b9a58217b0201821b306 (#88) * Add signature for mrenclave-f7011962644e7fe07444f60da66ae9c71a7b8ec9612eb13e804051ca1ebb707b (#89) * Add signature for mrenclave-28a4fc33dc2f138b4b37c9e970a0c49a0dba0556e9fae30899224fbe0019ec15 (#90) * Add signature for mrenclave-16c8b0d09780d9747e09c14694925ad6fd59edc5c5b1ef3c0e13e5d3e6fd67d9 (#92) * Add signature for mrenclave-ee9954d74c8c0433b90c569fd7a5abf50011c85cd8d84ccb961faf0477a1e714 (#94) * Add signature for mrenclave-a016eab7d21ac96e21b1b6b783604b744795b5647c0df8efa3aa006ed573c424 (#95) * Add signature for mrenclave-421170d3df227cafb8a9c0c90e5b82774c12cdd6e3aef9165bc9f7f5136c2fee (#96) * Add signature for mrenclave-6135687f253d6660845672fc293afc04f6719de612d959d1f9d0a71b2e573d86 (#97) * Add signature for mrenclave-bd6b2ea5dbc4b6bae26a030a69d874d5f37f23221b620f74699ffbc30a2eb3bf (#98) * Add signature for mrenclave-013520805788a990efd89e02d1111cea5c4c47a68a5c455df820096cee02b37d (#99) * feature/SP-6787, local subroot launch (#64) * Add docker boot image build * Add iniramfs * Add base kernel * Add kernel config merge * Add build kernel! * Fix busybox sources mirror URL * Move files, add rootfs * Pre-restructure * Add working mounts in chroot in Docker!11!!!11!!!11111 * Add image build step * Continue image * grub booting is working! * Fix initramfs * Fix initramfs, it's booting! * Add gve drivers, add working state disk mount * Add more configs * Add rancher * Working state and provider disk mount * it almost works! * Fix iscsi * Add readme * Fix UEFI boot * Move kata referencies to readme * Fix README * Fixes and fixes * Fix grub branch * Add bios amd build * Fix README * Add bios build, add ci-cd * Fix readme * remove sudo from ci * Fixes * And fixes * Add tdx bios build * Add other artifacts to output * Add upload other artifacts * Fix k8s.yaml rewriting * Remove debug * Fix SP_VM_IMAGE_VERSION * support more than 1TB memory with SEV SNP * draft * lxc support * added env replacing * read subroot env from root /sp * increase subroot version * update readme * review fix * buildx do not support from as arg * comments for mac/ip addresses * use build version instead of package.json version * strict mode for bash script * no need docker login to lxc access * increase default timeout * fixed bash style * quotes --------- Co-authored-by: DrunkBatya <drunkbatya.js@gmail.com> * Add signature for mrenclave-8f0bff733ae3c877e707e6080a6f3503df774e679b1e71b1118268cdb8b0cb3d (#101) * Add signature for mrenclave-b276c54e013087cdf253b484d0cbece392ea5ad0f37628eddee1e329f2852db1 (#102) * Add signature for mrenclave-430120e4b4b6d924d28075d06a78af58cd8331686671322a19d39a34e8c2c435 (#103) * Add signature for mrenclave-a967bce5a48ec0f350050b1e0eafc8799f5a6619ebbd8831d1a6c81fb20e16a8 (#104) * Add signature for mrenclave-fe962727c2453ea74bd04d9e0549ef2fbc5b6b9561b730c16132bf4ada6f244d (#105) * feature/SP-6861, token challenge implementation (#100) * Add signature for mrenclave-ce749e194db1efcd68a4154e874fc83d252f109ce7906b3a5c59f2cb9ecf7168 (#106) * Add signature for mrenclave-27edd65a413bbf80376d42a29c79ae2967331f4df774e5cf0c4563eeab923498 (#107) * Add signature for mrenclave-18b3167aca0eb5c4c9836fcb50392b3dbc7a0aebad674262d3b76d20026efeed (#108) * Add signature for mrenclave-41ed132fcfcd40be122c46f2e3f0087accee6c486cdfb45e1f546ade5be4d746 (#109) * Add signature for mrenclave-88e5ab69837ea9229170d78d0dd382c165f8d9cedb910bb7ee29035bda5c7804 (#110) * Add signature for mrenclave-88e5ab69837ea9229170d78d0dd382c165f8d9cedb910bb7ee29035bda5c7804 (#112) * Add signature for mrenclave-41ed132fcfcd40be122c46f2e3f0087accee6c486cdfb45e1f546ade5be4d746 (#113) * Add signature for mrenclave-21f3e7184ba6b989b258ebcd7c66c2341883c64543ec07d591eada4e82938c10 (#114) * Add signature for mrenclave-d8b956e13aa5e83b4887675124d96d8e5aeed8bf9dd5949219232d49cc9038b6 (#117) * Add mkfs.ext4 * mount overlay state * added logs handlers and mounting RO provider config * OverlayFS * Remove extra code * Remove useless flag * make coderabbitai happy --------- Co-authored-by: Petr Evstifeev <7293601+Villain88@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/Dockerfile (1)

262-271: Previously flagged issues remain unaddressed.

Two issues were raised in prior reviews for this section:

ADD vs COPY (lines 262, 263, 265, 268, 269, 270): Use COPY instead of ADD for local files per Hadolint DL3020.

Image digest pinning (line 267): The external image ghcr.io/super-protocol/tee-pki-authority-service-lxc:build-18725490828 should be pinned by digest for reproducibility and supply-chain safety.

🧹 Nitpick comments (5)

src/rootfs/files/configs/etc/systemd/system/check-config-files.service (1)

1-7: Consider adding an [Install] section if not present.

The service configuration appears to lack an [Install] section, which typically specifies when the service should be enabled (e.g., WantedBy=multi-user.target). If the file is complete as shown, consider adding this section to ensure the service can be properly enabled. If the service is started through another mechanism or the section exists but isn't shown, this comment can be disregarded.
src/initramfs/files/init.sh (4)
3-3: Consider adding set -e for fail-fast behavior.

While set -x provides useful debugging output, the script lacks set -e to exit on command failures. Currently, if a command fails before reaching a log_fail check, execution continues silently. However, since the script uses explicit error checking with || log_fail, this is a minor concern.
-set -x
+set -ex
61-63: Clarify the difference in label lookup methods.

Line 61 uses PARTLABEL (partition label) while line 63 uses -L (filesystem label). This appears intentional but could be confusing for maintainers.

Consider adding a brief comment explaining why different label types are used:
 # hash device can only be found by partition label (not fs label)
 hash_device_path="$(blkid -t PARTLABEL="rootfs_hash" --output device || echo)";

+# provider_config uses filesystem label
 provider_config_device_path="$(blkid -L provider_config --output device || echo)";
139-139: Hardcoded ext4 filesystem type for provider_config.

The mount explicitly specifies -t ext4, which will fail if the provider_config filesystem is a different type. Consider auto-detecting or documenting this requirement.

Either remove the explicit type to let mount auto-detect:
-mount -t ext4 -o ro "$provider_config_device_path" /mnt/sp || log_fail "Mounting provider config failed";
+mount -o ro "$provider_config_device_path" /mnt/sp || log_fail "Mounting provider config failed";
Or add a comment documenting the ext4 requirement if intentional.

30-30: Minor: Incomplete comment.

The comment appears to be missing a closing quote in the example.
-    LABEL_NAME="${1#*=}"; # 'LABEL=rootfs' > 'rootfs
+    LABEL_NAME="${1#*=}"; # 'LABEL=rootfs' -> 'rootfs'

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9a9dccf and 9560a9d.

📒 Files selected for processing (11)

signatures/sev-snp/pre-release/mrenclave-d8b956e13aa5e83b4887675124d96d8e5aeed8bf9dd5949219232d49cc9038b6.json (1 hunks)
src/Dockerfile (5 hunks)
src/initramfs/files/init.sh (2 hunks)
src/initramfs/files/initramfs.list (1 hunks)
src/kernel/files/configs/fragments/x86_64/superprotocol.conf (2 hunks)
src/rootfs/files/configs/etc/systemd/system/check-config-files.service (1 hunks)
src/rootfs/files/configs/etc/systemd/system/local-registry.service (1 hunks)
src/rootfs/files/configs/fstab (0 hunks)
src/rootfs/files/configs/state_disk_mount/state_disk_mount.service (0 hunks)
src/rootfs/files/configs/state_disk_mount/state_disk_mount.sh (0 hunks)
src/rootfs/files/scripts/create_base_system.sh (1 hunks)

💤 Files with no reviewable changes (3)

src/rootfs/files/configs/state_disk_mount/state_disk_mount.sh
src/rootfs/files/configs/state_disk_mount/state_disk_mount.service
src/rootfs/files/configs/fstab

🚧 Files skipped from review as they are similar to previous changes (1)

src/rootfs/files/scripts/create_base_system.sh

🧰 Additional context used

🪛 Hadolint (2.14.0)

src/Dockerfile

[error] 262-262: Use COPY instead of ADD for files and folders

(DL3020)

[error] 263-263: Use COPY instead of ADD for files and folders

(DL3020)

[error] 265-265: Use COPY instead of ADD for files and folders

(DL3020)

[error] 268-268: Use COPY instead of ADD for files and folders

(DL3020)

[error] 269-269: Use COPY instead of ADD for files and folders

(DL3020)

[error] 270-270: Use COPY instead of ADD for files and folders

(DL3020)

🔇 Additional comments (15)

src/rootfs/files/configs/etc/systemd/system/local-registry.service (1)

3-3: Remove this review comment—no concern identified.

The local-registry.sh script uses storage paths under /opt/ (/opt/hauler/store, /opt/hauler/registry, /opt/super/certs) and does not reference any mount points that would have been provided by state_disk_mount.service. The removal of this dependency does not impact registry functionality.

Likely an incorrect or invalid review comment.
signatures/sev-snp/pre-release/mrenclave-d8b956e13aa5e83b4887675124d96d8e5aeed8bf9dd5949219232d49cc9038b6.json (1)
1-7: LGTM! Verify signature validity and uniqueness.

The JSON structure is correct, the mrenclave format is valid (64 hex characters matching the filename), and all required fields are present. Please ensure the signature is cryptographically valid for this MRENCLAVE and that this entry doesn't duplicate an existing record.

Run the following script to check for duplicate mrenclaves:
#!/bin/bash
# Description: Check for duplicate mrenclave entries in the repository

# Search for this specific mrenclave value in all JSON files
rg -t json '"mrenclave":\s*"d8b956e13aa5e83b4887675124d96d8e5aeed8bf9dd5949219232d49cc9038b6"' -C 1
src/Dockerfile (5)

24-29: LGTM!

The additions of wipefs and lsblk to the util_linux build stage follow the existing pattern. Static linking, stripping, and copying are consistent with the blkid binary.

58-59: LGTM!

Stripping cryptsetup.static is consistent with the existing veritysetup.static stripping pattern.

74-86: LGTM!

The new e2fsprogs stage follows the established build pattern. Using mke2fs as mkfs.ext4 is correct since mkfs.ext4 is typically a symlink/copy of mke2fs. Static linking configuration is appropriate for initramfs inclusion.

92-96: LGTM!

The COPY statements correctly reference the binaries built in the corresponding builder stages.

166-166: Verify the image tag exists.

The image tag ghcr.io/super-protocol/tee-pki-curl:v1.7.5 must be published and accessible on GHCR before deployment. This change updates the tag from v1.7.4 to v1.7.5. Confirm the image exists by either checking the Super Protocol package registry or attempting a pull to validate the tag is available.

src/initramfs/files/initramfs.list (1)

15-19: LGTM!

The new binary entries correctly reference the source paths from the respective builder stages in the Dockerfile:

lsblk and wipefs from util_linux_builder

cryptsetup from cryptsetup_builder (via cryptsetup.static)

mkfs.ext4 from e2fsprogs_builder

Permissions (0755) and ownership (0 0) are consistent with existing entries.

src/rootfs/files/configs/etc/systemd/system/check-config-files.service (1)

3-3: The script does not depend on state disk mount. Verified that /usr/local/bin/check-config-files.sh contains no references to state disk paths or mount points. The script only accesses /etc/super, /var/lib/rancher, /etc/iscsi, /sp, and /proc—all of which exist independent of state_disk_mount.service. The removal of this dependency is safe and will not break functionality.

Regarding the missing [Install] section in the service file: this is intentional. The service is a oneshot unit triggered by check-config-files.timer, which has its own [Install] section specifying WantedBy=timers.target. Services triggered by timers do not require their own installation section.

src/kernel/files/configs/fragments/x86_64/superprotocol.conf (2)

2-2: LGTM! Building overlay FS into kernel.

Building CONFIG_OVERLAY_FS as =y (built-in) rather than =m (module) is appropriate since the init script relies on overlay filesystem before module loading infrastructure is available.

229-234: LGTM! WireGuard kernel configuration is complete.

The crypto primitives (ChaCha20, Poly1305, Blake2s, Curve25519) are the required dependencies for WireGuard. Using the x86_64-optimized variants ensures hardware acceleration.

src/initramfs/files/init.sh (4)

34-50: LGTM! Well-structured logging helpers.

The logging functions provide consistent formatting and appropriate severity levels. Using /dev/console ensures output is visible during early boot.

74-89: LGTM! Proper dm-verity verification flow.

The implementation correctly handles both verified and unverified boot paths. Using log_warn for skipped verification provides visibility into the security posture.

128-135: LGTM! Overlay filesystem setup is correct.

The overlay configuration properly uses the verified read-only rootfs as the lower layer with an encrypted upper layer for writes.

146-151: LGTM! Clean transition to real init.

Unmounting /proc and /sys before switch_root and using exec ensures a clean handoff to the final init system.

coderabbitai · 2025-12-17T09:18:33Z

src/initramfs/files/init.sh

+state_block_device_count="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")" | wc -l)";
+
+if [ "$state_block_device_count" -lt 1 ]; then
+    log_fail "Failed to get state block device, please attach an extra disk to this VM";
+fi
+if [ "$state_block_device_count" -gt 1 ]; then
+    log_fail "Found more than one state block device, please remove an extra block device and restart the VM";
+fi
+
+state_block_device_name="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")")";
+if [ -z "$state_block_device_name" ]; then
+    log_fail "Failed to get state block device, this error is abnomal and you should notify the SuperProtocol support team if you see this..";
 fi
+state_block_device_path="/dev/$state_block_device_name";


⚠️ Potential issue | 🟠 Major

Potential issue with provider_config device exclusion logic.

If provider_config_device_path is a partition (e.g., /dev/sdb1), basename yields sdb1, but lsblk -d lists whole disks only (e.g., sdb). The grep -v "sdb1" won't exclude sdb, potentially selecting the provider_config's parent disk as the state device.

Consider extracting the parent device name for provider_config:

+provider_config_parent="$(lsblk -no PKNAME "$provider_config_device_path" 2>/dev/null || basename "$provider_config_device_path")"; + -state_block_device_count="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")" | wc -l)"; +state_block_device_count="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$provider_config_parent" | wc -l)"; ... -state_block_device_name="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")")"; +state_block_device_name="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$provider_config_parent")";

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

state_block_device_count="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")" | wc -l)";

if [ "$state_block_device_count" -lt 1 ]; then

log_fail "Failed to get state block device, please attach an extra disk to this VM";

fi

if [ "$state_block_device_count" -gt 1 ]; then

log_fail "Found more than one state block device, please remove an extra block device and restart the VM";

fi

state_block_device_name="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$(basename "$provider_config_device_path")")";

if [ -z "$state_block_device_name" ]; then

log_fail "Failed to get state block device, this error is abnomal and you should notify the SuperProtocol support team if you see this..";

fi

state_block_device_path="/dev/$state_block_device_name";

provider_config_parent="$(lsblk -no PKNAME "$provider_config_device_path" 2>/dev/null || basename "$provider_config_device_path")";

state_block_device_count="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$provider_config_parent" | wc -l)";

if [ "$state_block_device_count" -lt 1 ]; then

log_fail "Failed to get state block device, please attach an extra disk to this VM";

fi

if [ "$state_block_device_count" -gt 1 ]; then

log_fail "Found more than one state block device, please remove an extra block device and restart the VM";

fi

state_block_device_name="$(lsblk -d -n -o NAME | grep -v "$main_block_device_name" | grep -v "$provider_config_parent")";

if [ -z "$state_block_device_name" ]; then

log_fail "Failed to get state block device, this error is abnomal and you should notify the SuperProtocol support team if you see this..";

fi

state_block_device_path="/dev/$state_block_device_name";

coderabbitai · 2025-12-17T09:18:33Z

src/initramfs/files/init.sh

+random_key="$(dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64)";
+
+log_info "Wiping filesystem signatures from the device $state_block_device_path";
+wipefs -a "$state_block_device_path" || log_warn "Failed to wipe $state_block_device_path";
+
+log_info "Formatting the device $state_block_device_path as LUKS encrypted";
+echo "$random_key" | cryptsetup luksFormat "$state_block_device_path" --batch-mode || log_fail "Failed to format $state_block_device_path";
+
+log_info "Opening the LUKS encrypted device $state_block_device_path";
+echo "$random_key" | cryptsetup luksOpen "$state_block_device_path" crypto || log_fail "Failed to open";


⚠️ Potential issue | 🟠 Major

Security: set -x will expose the encryption key to console.

With set -x enabled at line 3, the shell will echo the random_key variable contents and the echo "$random_key" | cryptsetup commands to /dev/console, potentially exposing the encryption key in logs or to observers.

Disable tracing around sensitive operations:

random_key="$(dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64)"; log_info "Wiping filesystem signatures from the device $state_block_device_path"; wipefs -a "$state_block_device_path" || log_warn "Failed to wipe $state_block_device_path"; +set +x # Disable tracing to avoid exposing encryption key log_info "Formatting the device $state_block_device_path as LUKS encrypted"; echo "$random_key" | cryptsetup luksFormat "$state_block_device_path" --batch-mode || log_fail "Failed to format $state_block_device_path"; log_info "Opening the LUKS encrypted device $state_block_device_path"; echo "$random_key" | cryptsetup luksOpen "$state_block_device_path" crypto || log_fail "Failed to open"; +set -x # Re-enable tracing

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

random_key="$(dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64)";

log_info "Wiping filesystem signatures from the device $state_block_device_path";

wipefs -a "$state_block_device_path" || log_warn "Failed to wipe $state_block_device_path";

log_info "Formatting the device $state_block_device_path as LUKS encrypted";

echo "$random_key" | cryptsetup luksFormat "$state_block_device_path" --batch-mode || log_fail "Failed to format $state_block_device_path";

log_info "Opening the LUKS encrypted device $state_block_device_path";

echo "$random_key" | cryptsetup luksOpen "$state_block_device_path" crypto || log_fail "Failed to open";

random_key="$(dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64)";

log_info "Wiping filesystem signatures from the device $state_block_device_path";

wipefs -a "$state_block_device_path" || log_warn "Failed to wipe $state_block_device_path";

set +x # Disable tracing to avoid exposing encryption key

log_info "Formatting the device $state_block_device_path as LUKS encrypted";

echo "$random_key" | cryptsetup luksFormat "$state_block_device_path" --batch-mode || log_fail "Failed to format $state_block_device_path";

log_info "Opening the LUKS encrypted device $state_block_device_path";

echo "$random_key" | cryptsetup luksOpen "$state_block_device_path" crypto || log_fail "Failed to open";

set -x # Re-enable tracing

🤖 Prompt for AI Agents

In src/initramfs/files/init.sh around lines 111 to 120, tracing (set -x) will expose the generated random_key and the cryptsetup commands to console; surrounding the sensitive operations disable shell tracing (set +x) before generating the key and before piping it to cryptsetup, then re-enable tracing (set -x) afterwards, ensure the key variable is zeroed/unset immediately after use, and avoid printing or logging the key or the full cryptsetup command output.

* Add OpenResty service implementation: Introduce main.py for OpenResty service logic, manifest.yaml for service configuration, and setup script for SwarmDB integration. Update setup_runtime_tools.sh to include OpenResty dependencies (openssl) in the root filesystem. This commit establishes foundational components for managing OpenResty within the system, enhancing service management and configuration capabilities. * Add OpenResty installation to Dockerfile: Introduce install_openresty.sh script for installing OpenResty and its Lua dependencies within the VM root filesystem. Update Dockerfile to include OpenResty installation and disable autostart during build, enhancing service management capabilities. * Enhance cleanup_rootfs.sh: Add comment for best-effort removal of build toolchain and correct g++ package name to ensure proper cleanup during root filesystem preparation. * Refactor cleanup_rootfs.sh and install_openresty.sh: Remove outdated comment on toolchain cleanup, correct g++ package name, and update OpenResty installation process to use GPG keyring and signed-by repository format for improved security and compliance with Ubuntu standards. * Update cleanup_rootfs.sh: Add comments for best-effort removal of heavy toolchain packages while retaining core development libraries to ensure compatibility with OpenResty and other components. * Add OpenResty setup script: Introduce 63.setup-openresty.sh for bootstrapping the OpenResty service into SwarmDB via MySQL client. This script validates the service manifest and registers service records, enhancing service management and integration capabilities within the system. * Remove OpenResty setup script: Delete 45.setup-openresty.sh as it is no longer needed for bootstrapping the OpenResty service into SwarmDB. This cleanup simplifies the service management process and reduces redundancy in the codebase. * Update swarm-cloud subproject commit: Change subproject reference from 36c2c08 to 0f097bd, reflecting the latest updates in the swarm-cloud repository. * Add OpenResty K8s Router: Introduce main.py, manifest.yaml, and setup script for SwarmDB integration. The main.py implements service logic for managing ingress routing, while manifest.yaml defines service configuration. The setup script registers the service in SwarmDB, enhancing service management and integration capabilities. * Update ROUTE_DOMAIN_SUFFIX in OpenResty K8s Router: Change domain suffix from '.k8s.oresty.superprotocol.io' to '.oresty.superprotocol.io' for improved routing configuration. * Add swarm-cloud services and remove OpenResty components: Introduce setup scripts for CockroachDB, Knot, and swarm-cloud-api services into SwarmDB, enhancing service management. Remove OpenResty K8s Router files (main.py and manifest.yaml) as they are no longer needed, streamlining the codebase. * Update Dockerfile for OpenResty integration: Add OpenResty service files to the Docker image and create necessary directories for SSL storage. Remove obsolete OpenResty main.py and manifest.yaml files to streamline the codebase and enhance service management. * Update CockroachDB setup script: Modify ClusterPolicies creation parameters to set minSize to 1, allowing for more flexible cluster scaling while maintaining maxSize and maxClusters constraints. * Add CockroachDB and Knot DNS installation scripts: Introduce scripts for installing CockroachDB and Knot DNS into the VM root filesystem, enhancing service management capabilities. Update Dockerfile to include these installations during the build process. * Update Python runtime dependencies in setup_runtime_tools.sh: Add 'requests' package to the installation command, enhancing the functionality of the runtime tools. * Remove command initialization from OpenResty setup script: Simplify the service creation command in 63.setup-openresty.sh by omitting the --omit-command-init flag, streamlining the service registration process. * Add CockroachDB service implementation: Introduce main.py and manifest.yaml for CockroachDB, enabling service management and integration within SwarmDB. Update Dockerfile to remove obsolete CockroachDB files and ensure proper service provisioning. * Enhance runtime tools installation in setup_runtime_tools.sh: Add 'netcat' to the list of runtime packages being installed, improving network utility availability in the root filesystem. * Add SwarmSecrets management to swarm-cli: Implement create_swarm_secrets function and update CLI to support SwarmSecrets creation. Enhance setup-knot.sh to ensure 'base_domain' secret is created during setup. * Add Knot DNS service implementation: Introduce main.py and manifest.yaml for Knot DNS, enabling service management and integration within SwarmDB. Update Dockerfile to provision Knot service by removing obsolete components and ensuring proper setup. * Update base_domain secret value in setup-knot.sh: Change the value from 'oresty.superprotocol.io' to 'test.oresty.superprotocol.io' to reflect the new testing environment configuration. * Add OpenResty service implementation: Introduce main.py and manifest.yaml for OpenResty, enabling service management and integration within SwarmDB. Update Dockerfile to provision OpenResty by removing obsolete components and ensuring proper setup. * Add swarm-cloud-api runtime wrappers: Introduce Node.js entrypoints for swarm-cloud-api and schema-sync in the Dockerfile. Update build_swarm_cloud.sh to install swarm-cloud-api distribution and adjust logging for clarity. * Refactor swarm-cloud-api installation: Move runtime wrappers to /usr/local/lib/swarm-cloud-api in Dockerfile. Update build_swarm_cloud.sh to reflect new installation path and add Node.js entrypoints for swarm-cloud-api and schema-sync. * Add swarm-cloud-api implementation: Introduce main.py and manifest.yaml for swarm-cloud-api, enabling service management and integration within SwarmDB. Update Dockerfile to provision swarm-cloud-api by commenting out the obsolete swarm-cloud-api copy command. * Refactor Knot DNS configuration handling: Remove unused node_addrs parameter and streamline listen address assignment. Enhance zone creation with proactive transaction management and retry logic for configuration commits to handle busy resource errors. * Update Dockerfile to disable Knot DNS autostart during build: Remove symlinks for knot.service to prevent systemd dependencies, ensuring a smoother build process. * Update swarm-cloud-api installation path and logging: Change installation directory to /usr/local/lib/swarm-cloud and update logging messages in build_swarm_cloud.sh for clarity. Adjust main.py to reflect new API_BIN location. * Update Knot DNS configuration to listen on local WireGuard tunnel IP: Modify main.py to ensure Knot DNS only listens on the WireGuard tunnel IP for this node, preventing conflicts with other local DNS services. * Refactor Knot DNS configuration formatting: Adjust indentation in main.py for improved readability while maintaining the existing functionality of listening on the WireGuard tunnel IP. * Add schema-sync script for swarm-cloud: Introduce a new schema-sync script in /usr/local/lib/swarm-cloud for running the CLI, and update the existing schema-sync wrapper in /usr/local/lib/swarm-cloud-api to delegate to the new script. Enhance build_swarm_cloud.sh to install production-only Node.js dependencies for the swarm-cloud-api distribution. * Remove command initialization from ClusterServices creation in swarm-cloud API setup script * Enhance Knot DNS configuration templates: Add detailed comments to clarify the purpose of 'catalog' and 'member' templates in master and member nodes. Update zone creation logic to use the 'member' template for proper catalog role assignment, ensuring compatibility with newer Knot versions. * Refactor schema sync execution in swarm-cloud API: Update the schema sync process to use the bundled CLI directly, ensuring deterministic behavior. Enhance error handling and logging for improved clarity during execution. * Enhance runtime tools setup script: Add dnsutils to the list of installed packages for improved DNS functionality. Update comments in Knot zone creation logic to clarify handling of transient control socket errors during reload operations, treating them as non-fatal. * Add swarm-cloud-ui build and startup script: Enhance build_swarm_cloud.sh to include building and publishing artifacts for swarm-cloud-ui. Introduce a new script for starting the swarm-cloud-ui frontend in development mode, ensuring proper dependency management and setup. * Update swarm-cloud-api.service dependencies: Add redis.service and cockroachdb.service to the After and Requires directives for improved service management and startup order. * Refactor swarm-cloud-ui startup script: Update the script to start the swarm-cloud-ui frontend using the same layout as the VM image, removing the dependency on the local swarm-cloud repository and hardcoding the path to the built UI artifacts. * Refactor build_swarm_cloud.sh and swarm-cloud-ui.sh: Streamline the publishing of artifacts and dependency management for swarm-cloud components. Remove unnecessary Node.js dependency installations during runtime and ensure proper copying of workspace-level dependencies. * Update build_swarm_cloud.sh and swarm-cloud-ui.sh: Correct the path for copying built UI artifacts and modify the Node.js check to ensure proper installation before starting the swarm-cloud-ui. This enhances the reliability of the build and startup processes. * Fix path for copying swarm-cloud-ui artifacts in build_swarm_cloud.sh: Update the source directory to ensure correct artifact publishing. This change improves the accuracy of the build process. * Enhance build_swarm_cloud.sh and swarm-cloud-ui.sh: Update the build script to copy shared UI libraries and configuration files, and modify the swarm-cloud-ui startup to run in development mode. This improves the build process and facilitates local development. * Refactor swarm-cloud-api.service and build_swarm_cloud.sh: Simplify service dependencies by removing redis.service and cockroachdb.service from the After and Requires directives. Update build script to clean up broken node_modules symlinks, ensuring a smoother build process for swarm-cloud-ui artifacts. * Refactor build_swarm_cloud.sh and swarm-cloud-ui.sh: Update the build process to include publishing artifacts for swarm-cloud-api and swarm-cloud-ui, streamline the deployment of UI components, and enhance logging for better clarity during execution. This improves the overall build efficiency and artifact management. * Fix import path in deployed UI library within build_swarm_cloud.sh: Adjust the breadcrumb component's import statement to resolve a known path issue, ensuring compatibility with the Next.js transpilation process. This change enhances the reliability of the deployed UI artifacts. * Refine import handling in build_swarm_cloud.sh: Update the script to adjust TypeScript import paths in the deployed UI library, ensuring compatibility with Next.js expectations for module imports. This change enhances the reliability of the deployed artifacts. * Refine TypeScript import adjustments in build_swarm_cloud.sh: Update the script to ensure proper handling of import paths in the deployed UI library, enhancing compatibility with Next.js expectations. This change improves the reliability of the build process and the resulting artifacts. * Update build_swarm_cloud.sh to ensure proper copying of Node.js dependencies: Create the target directory for node_modules and copy its contents while preserving the structure. This change enhances the build process by maintaining the integrity of the Node.js module layout. * Refactor build_swarm_cloud.sh to streamline artifact publishing: Update the script to copy both 'dist' and 'node_modules' for swarm-cloud-api, swarm-node, and swarm-cloud-ui, ensuring proper deployment structure. This change enhances the build process and artifact management. * Update build_swarm_cloud.sh to create target directory for swarm-cloud-ui deployment: Adjust the script to ensure the target directory exists before copying artifacts, improving the deployment process and ensuring a smoother build experience. * Comment out swarm-node and swarm-cloud-ui build steps in build_swarm_cloud.sh: Temporarily disable the building and publishing of swarm-node and swarm-cloud-ui artifacts to streamline the script for current development needs. This change helps focus on other aspects of the build process. * Add test-app and test-app-route services with setup scripts and manifests: Introduce new services for test-app and test-app-route, including their respective Python scripts and YAML manifests. Implement setup scripts to bootstrap these services into SwarmDB, ensuring proper registration of cluster policies and services. This enhances the overall service architecture and provides a foundation for future development. * Re-enable swarm-node build steps in build_swarm_cloud.sh: Restore the building and publishing of swarm-node artifacts, enhancing the build process to include necessary components for swarm-node. This change refocuses the script on the complete artifact management for swarm-cloud services. * Update swarm-node script to correct execution path: Change the execution command in swarm-node.sh to point to the correct location of the main.js file, ensuring proper functionality of the swarm-node service. * Reorganize Node.js dependency copying in build_swarm_cloud.sh: Update the script to streamline the copying of package.json, pnpm-lock.yaml, and pnpm-workspace.yaml before copying node_modules, ensuring a more logical order and maintaining the integrity of the build process. * Enhance Python runtime dependencies in setup_runtime_tools.sh: Add 'redis' to the list of installed packages, ensuring the necessary components for Redis functionality are included during the setup process. * Refactor ensure_route_in_redis function in main.py: Update route creation logic to derive backend URLs from WireGuard tunnel IPs of test-app cluster nodes, replacing the static target URL. This enhances dynamic routing capabilities based on the current cluster state.

…sure-kubectl-link.service and swarm-db.service by removing unnecessary service requirements. This enhances service management and startup order within the system.

…reate and manage Redis data and log directories with appropriate ownership and permissions. This improves security and ensures proper directory structure for Redis operations.

…db, and wireguard services during build. This enhances service management by preventing unnecessary systemd symlinks and streamlining the build process.

…g system boots into multi-user mode instead of graphical mode. This change enhances system startup behavior and aligns with service management practices.

- Modify Dockerfile to copy swarm scripts to the new directory for service launchers. - Update swarm-services.service to reflect the new path for service scripts. - Change swarm-services.sh to point to the updated directory for service execution. - Adjust test documentation to use the new service launcher paths for consistency.

… launchers - Introduce multiple setup scripts for new services including wireguard, hw-measurement, latency-measurement, geo-ip-measurement, rke2, redis, cockroachdb, knot, openresty, swarm-cloud-api, nats, test-app, test-app-route, and mongodb. - Update Dockerfile to reflect the new path for swarm service launchers. - Ensure all new scripts are executable and properly configured for service registration in SwarmDB.

- Introduce README_SWARM.md to provide a comprehensive overview of Swarm deployment architecture, including details on swarm-db, swarm-node, and swarm-cli.py usage. - Add swarm.mdc to establish guidelines for handling user inquiries related to Swarm, ensuring that the README is referenced as the primary source of truth for deployment processes and entity management.

… ClusterPolicies and ClusterServices - Update all service setup scripts to check for the existence of ClusterPolicies and ClusterServices before creation, preventing unnecessary duplication. - Modify the swarm-cli.py to include new 'get' commands for fetching existing ClusterPolicies and ClusterServices, enhancing the CLI's functionality. - Ensure consistent messaging across scripts to improve user experience during service setup.

* Refactor Knot and OpenResty services for improved DNS management and configuration - Update Knot service to enhance zone management, including the addition of dynamic zone creation and improved error handling for zone configuration. - Introduce new functions for adding and removing zones from the zones configuration file, ensuring persistent management of dynamic zones. - Modify OpenResty service to utilize the Knot leader's tunnel IP for DNS updates, improving reliability in DNS record management. - Update service scripts and configuration paths to align with the new directory structure for swarm services. - Ensure consistent logging and error handling across services to enhance debugging and user experience. * Update swarm-cloud submodule to latest commit d14169b6a503af4fcae5cf57fabb58a3167d8296 * Add ensure_redis_directories function to manage Redis data and log directories - Implement a new function to create Redis data and log directories with appropriate ownership and permissions. - Ensure the Redis data directory has restrictive permissions for security. - Integrate the new function into the Redis configuration writing process to maintain directory structure consistency. * Update Redis configuration to disable systemd supervision - Modify the Redis configuration to disable systemd supervision, preventing "NOTIFY_SOCKET not found" warnings in the logs. This change is necessary as Redis is run under a regular systemd service without Type=notify. * Add support for single-node Redis cluster initialization - Implement logic to handle the creation of a single-node Redis cluster by manually assigning all slots (0..16383) to the node. - Add error handling and logging for the initialization process to ensure successful setup and provide feedback on failures. - Enhance the existing create_redis_cluster function to accommodate this special case, improving flexibility in cluster management. * Add pwd import to main.py for user management functionality * Add external IP support in Knot DNS configuration - Enhance the write_knot_config function to accept an optional node_addrs parameter for determining the external IP of the node. - Update the configuration generation to include listen directives for the external IP, allowing external clients to connect. - Refactor the handle_apply function to pass the new node_addrs parameter when writing the Knot configuration. * Update Dockerfile to ensure main.py in swarm-cloud services is executable - Add a command to change permissions for main.py in the swarm-cloud services directory, ensuring it is executable during the build process. * Update setup script for CockroachDB to correct service path references - Change the service path in the setup script from /etc/swarm-cloud/services to /etc/swarm-services for consistency with the current directory structure. - Ensure that the LOCATION_PATH variable reflects the updated service path, maintaining clarity in the script's functionality. * Add functions for single-node cluster initialization and replica node addition in Redis - Implement `ensure_single_node_cluster` to initialize a single-node Redis cluster by assigning all hash slots if not already set. - Add `add_replica_node` function to facilitate the addition of a replica node to an existing Redis cluster. - Refactor `create_redis_cluster` to utilize these new functions, improving cluster management and flexibility for single and multi-node setups. * Refactor Redis cluster management functions for improved initialization and error handling - Replace the `ensure_single_node_cluster` function with a more comprehensive `create_redis_cluster` function that handles single-node and two-node cluster setups. - Implement enhanced error handling and logging for cluster initialization processes, ensuring clear feedback on failures. - Update the Redis configuration to enable systemd supervision, aligning with best practices for service management. - Adjust validation to require at least three nodes for multi-node cluster creation, improving robustness in cluster setup. * Enhance Redis cluster creation logic with improved error handling and support for single-node setups - Refactor `create_redis_cluster` to ensure proper initialization of single-node clusters and add replica nodes. - Implement detailed logging for cluster state checks and initialization processes, providing clearer feedback on errors. - Adjust validation to require at least one node for cluster creation, improving flexibility in cluster management. * Refactor Redis cluster handling to improve initialization and error messaging - Update logic for cluster leader to ensure proper readiness checks and idempotent cluster creation. - Enhance error messages for cluster creation failures and node readiness, providing clearer feedback. - Adjust handling of node properties to reflect readiness status during reconciliation processes. * Update Dockerfile and main.py for swarm-cloud API configuration - Add environment variables for SWARM_CLOUD_API installation and configuration directories in the Dockerfile. - Modify main.py to use these environment variables for API installation and configuration paths, enhancing flexibility and configurability. * Update swarm-cloud API script to correct path for main.js execution - Modify the script to reference the correct path for the main.js file, changing from './dist/apps/swarm-cloud-api/main.js' to './apps/swarm-cloud-api/dist/main.js' to ensure proper execution of the swarm-cloud API. * Add directory creation for swarm-cloud API configuration in startup script - Introduce a command to create the /etc/swarm-cloud-api directory in the swarm-cloud API startup script, ensuring the necessary configuration directory is available before execution. * Update swarm-cloud API setup script to omit command initialization during cluster creation - Modify the cluster creation command in the setup script to include the --omit-command-init flag, allowing for more flexible initialization without executing the command immediately. * Refactor Dockerfile and main.py for swarm-cloud API path adjustments - Remove environment variable definitions for SWARM_CLOUD_API directories in the Dockerfile, simplifying the setup. - Update main.py to use hardcoded paths for the SWARM_CLOUD_API installation and configuration directories, ensuring consistent access to the necessary resources. * Refactor Dockerfile and scripts for swarm-cloud API and Redis integration - Update Dockerfile to streamline the setup of swarm-cloud API, ensuring proper permissions for the schema-sync scripts. - Modify swarm-cloud API script to remove unnecessary directory creation and enhance the schema-sync execution path. - Introduce a new Redis management script to handle cluster initialization, configuration, and health checks, improving overall Redis integration and error handling. * Update Dockerfile to correct schema-sync path for swarm-cloud API - Adjust the path for the schema-sync entrypoint in the Dockerfile, ensuring it is correctly linked to the swarm-cloud directory. - Remove duplicate ADD command for schema-sync, streamlining the Dockerfile and improving clarity. * Fix schema-sync path in Dockerfile for swarm-cloud API - Correct the path for the schema-sync script in the Dockerfile to ensure it points to the appropriate directory. - Update permissions to reflect the new path, enhancing the clarity and functionality of the Dockerfile.

* Remove deprecated services and update submodule references - Delete CockroachDB, Geo IP Measurement, Hardware Measurement, Knot, Latency Measurement, and Redis services along with their associated files and configurations. - Update swarm-cloud submodule to the latest commit 28f75ffa63ad324b6382b5e7ad89893c2119bb9b. - Adjust service setup scripts to reflect the new directory structure for swarm services, ensuring consistency across service management. * Implement retry logic for Redis route configuration in ensure_route_in_redis function - Add a retry mechanism to handle transient connectivity issues when setting routes in Redis Cluster. - Introduce a maximum of three attempts with a delay between retries to improve reliability. - Enhance error messaging to provide clearer feedback on failures during the route setting process. * Enhance error handling in ensure_route_in_redis function - Update error messages to be printed to stderr for better visibility during execution. - Ensure that error messages are logged locally in the handle_apply function, improving traceability in node logs. - Refactor error handling to use a consistent message format across different failure scenarios. * Increase maximum retry attempts in ensure_route_in_redis function from 3 to 20 to improve resilience against transient connectivity issues when configuring routes in Redis. * Enhance Redis initialization logic in handle_init function - Update the init command to wait for Redis cluster readiness, providing early feedback in logs if the cluster is unavailable. - Implement error handling for invalid state formats and Redis connection issues, with retries for cluster reachability. - Improve logging for connection attempts and errors, ensuring clear visibility during the initialization process. * Refactor Redis initialization checks in main.py - Remove the is_redis_cluster_initialized function and its associated checks from ensure_route_in_redis and handle_init functions to streamline the Redis readiness logic. - Update handle_init to focus on Redis reachability instead of cluster initialization, simplifying the initialization process. - Enhance error handling and logging for Redis connection issues, ensuring clearer feedback during execution. * Remove OpenResty and RKE2 services and their configurations - Delete main.py and manifest.yaml files for both OpenResty and RKE2 services, effectively removing their implementations from the project. - This cleanup is part of a broader effort to streamline service management and eliminate deprecated components from the codebase. * Refactor handle_init function in main.py - Remove redundant return statement in handle_init, ensuring a single return point for successful completion. - Maintain error handling for service creation and improve clarity in the function's flow, enhancing overall readability and maintainability. * Update swarm-cloud API and remove obsolete files - Update submodule reference for swarm-cloud to commit fe20b4b, ensuring the latest changes are integrated. - Remove main.py and manifest.yaml files for swarm-cloud-api, streamlining the project by eliminating deprecated components. - Adjust setup script paths to reflect the new directory structure for swarm services, enhancing consistency across service management. * Refactor Dockerfile and swarm scripts for improved service management - Remove references to the obsolete swarm-cloud-api.service in the Dockerfile, streamlining service configurations. - Update swarm scripts to eliminate the CLI variable assignment, enhancing clarity and reducing redundancy across multiple setup scripts. - Adjust the swarm-cloud API setup script to omit command initialization during cluster creation, allowing for more flexible setup processes. * Update Dockerfile to correct swarm-cloud API paths - Modify symbolic link for swarm-cloud artifacts to point to the correct directory. - Adjust the path for the schema-sync entrypoint to align with the updated directory structure for swarm-cloud API. - Ensure runtime wrappers for swarm-cloud API are correctly set up for execution. * Refactor swarm-cloud API entrypoints and update paths - Remove obsolete schema-sync and swarm-cloud API entrypoint scripts from the project. - Update the paths in the remaining entrypoint scripts to reflect the new directory structure, ensuring correct execution of the swarm-cloud API. - Adjust Dockerfile to align with the changes in entrypoint locations, enhancing overall consistency in the setup.

* Refactor hardening-vm.sh to streamline iptables rules - Remove conditional DEBUG check for firewall rules, applying DROP policy by default. - Consolidate iptables rules for established connections, loopback traffic, DNS requests, and API server access. - Add rules for WireGuard and swarm-db gossip traffic. - Maintain DHCP support for LXC containers while ensuring clarity and consistency in the script's logic. * Enhance hardening-vm.sh to allow SSH access - Add iptables rule to permit incoming SSH connections on TCP port 22 when not in debug mode. - Ensure SSH service starts only if the debug flag is not set, maintaining security while allowing remote access. * Update hardening-vm.sh to allow HTTP and HTTPS traffic - Modify iptables rules to permit incoming traffic on TCP ports 80 (HTTP) and 443 (HTTPS) without restrictions on source IPs. - Simplify the existing rule for HTTPS by removing the specific source IP condition, enhancing accessibility for web services. * Update hardening-vm.sh to allow additional IP range for incoming traffic - Add iptables rule to accept incoming traffic from the 10.13.0.0/16 subnet, enhancing network accessibility for specific services. - This change complements existing rules for other IP ranges, ensuring comprehensive traffic management.

* Update ClusterPolicy creation in setup-openresty.sh to limit maxSize to 1 - Adjust the maxSize parameter in the ClusterPolicy creation command from 3 to 1, ensuring a more controlled cluster scaling behavior during setup. * chore: Update 64.setup-swarm-cloud-api.sh to exit early * fix: Update ROUTE_KEY format in main.py for test app route - Change ROUTE_KEY from "routes:{ROUTE_DOMAIN}" to "manual-routes:{ROUTE_DOMAIN}" to reflect the new routing structure for the test application. * chore: Update swarm-cloud submodule to latest commit 3037aa2 * chore: Remove early exit from 64.setup-swarm-cloud-api.sh to enable proper script execution

coderabbitai

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents

In `@src/rootfs/files/configs/etc/systemd/system/ensure-kubectl-link.service`:
- Around line 1-11: The unit uses ConditionPathExists=/usr/local/bin/kubectl
which prevents the service from running because /usr/local/bin/kubectl is the
symlink the script ensure-kubectl-link.sh must create; update the unit to check
the real RKE2 binary (e.g.
ConditionPathExists=/var/lib/rancher/rke2/bin/kubectl) or remove the
ConditionPathExists line entirely so
ExecStart=/usr/local/bin/ensure-kubectl-link.sh runs when local-fs.target is
reached and can create the symlink; ensure references to ConditionPathExists and
ensure-kubectl-link.sh are the ones you modify.

In `@src/rootfs/files/configs/usr/local/bin/hardening-vm.sh`:
- Around line 15-17: The rule in hardening-vm.sh that appends "iptables -A INPUT
-p udp --sport 53 -j ACCEPT" is insecure because source ports can be spoofed;
remove that line and rely on the existing ESTABLISHED,RELATED accept rule (or
explicitly allow UDP responses by matching connection state) instead of
permitting packets by source port; update or remove the iptables command
referencing "--sport 53" in the script to close this hole.

In `@src/rootfs/files/scripts/build_swarm_cloud.sh`:
- Around line 75-77: The script currently calls chroot_init; build_swarm_cloud;
chroot_deinit; but with set -e a failure in build_swarm_cloud will skip
chroot_deinit; add an EXIT trap to always call chroot_deinit on script exit (and
optionally restore previous trap or disable it after successful deinit) so
mounts are cleaned even on errors; place the trap definition before calling
chroot_init (e.g., trap 'chroot_deinit' EXIT) and ensure chroot_deinit is
idempotent/safe to run multiple times.

In `@src/rootfs/files/scripts/install_cockroachdb.sh`:
- Around line 43-45: The script currently calls chroot_init,
install_cockroachdb, and chroot_deinit sequentially but can exit early on error
and leave mounts under sys/proc/tmp/dev; add a trap to ensure chroot_deinit
always runs on error or exit by registering trap 'chroot_deinit' for EXIT (and
optionally ERR) at the top of the script after defining chroot_deinit and
chroot_init so that regardless of failures in install_cockroachdb the cleanup
runs; ensure the trap is installed before calling
chroot_init/install_cockroachdb and that chroot_deinit is idempotent/safe to
call multiple times.
- Around line 23-38: The install script currently downloads a non-pinned
"cockroach-latest" tarball, skips checksum verification, and silently falls back
to amd64 for unknown architectures; update the chroot block to (1) detect
architecture using arch/variable cr_arch but instead of defaulting set supported
values only (x86_64 -> amd64, aarch64|arm64 -> arm64) and explicitly exit with
an error on unsupported architectures, (2) replace the wget URL to download a
pinned release (e.g., cockroach-vX.Y.Z.linux-${cr_arch}.tgz) and (3) fetch and
verify the published SHA256 checksum before extracting (use tar -xzf only after
checksum validation), failing the script if the checksum does not match; ensure
the cp to /usr/local/bin/cockroach and subsequent cleanup remain unchanged but
only execute after successful verification.

In `@src/rootfs/files/scripts/install_pccs.sh`:
- Around line 15-21: Remove the hardcoded secrets by reading PCCS_API_KEY and
PCCS_PASSWORD from environment variables (use ${PCCS_API_KEY} and
${PCCS_PASSWORD}) and only default PCCS_PORT to 8081; ensure the script exits
with a clear error if PCCS_API_KEY or PCCS_PASSWORD are unset at
runtime/build-time, and compute USER_TOKEN from the environment-provided
PCCS_PASSWORD (still using sha512sum) so no plaintext credentials remain in the
repo.

In `@src/rootfs/files/scripts/setup_runtime_tools.sh`:
- Around line 33-35: Add an EXIT trap around the chroot sequence so
chroot_deinit is always invoked on script exit: after defining
chroot_init/chroot_deinit, register trap 'trap chroot_deinit EXIT' before
calling chroot_init and setup_runtime_tools, and ensure you remove or reset the
trap if chroot_deinit is called manually (e.g., trap - EXIT after successful
cleanup) to avoid double cleanup; reference the chroot_init, setup_runtime_tools
and chroot_deinit symbols when making the change.

In `@src/services/apps/mongodb/manifest.yaml`:
- Around line 20-29: The current cluster selection stream using
($swarmdb.clusters[] | select(.cluster_policy == "wireguard" ...)) can yield
multiple matching $wgCluster entries and produce duplicate state objects; make
it deterministic by sorting the candidate clusters (e.g., by a stable key like
.id or .name) and selecting only the first match. Update the pipeline around
$swarmdb.clusters[] / select(.cluster_policy == "wireguard" ...) so it applies
sort_by(<stable-key>) and then takes the first result to bind to $wgCluster,
keeping the node-existence check using $swarmdb.clusternodes[] and $mongoNodeIds
unchanged.

In `@src/services/apps/nats/main.py`:
- Around line 251-258: The code currently returns a postponed PluginOutput but
still calls mark_node_ready(), which sets nats_node_ready=true even when
wait_for_tcp failed; change the failure branch to NOT call mark_node_ready() (or
explicitly call a mark_node_not_ready()/clear flag helper if one exists) so the
returned PluginOutput does not advertise readiness when the client port is
unreachable; update the PluginOutput in the wait_for_tcp failure path (the block
using wait_for_tcp, PluginOutput, mark_node_ready, and local_state) to omit
node_properties or set it to a "not ready" value.
- Around line 225-273: The code assumes leader_node_id always exists and will
produce "Waiting for leader node None" loops; before computing is_leader or
using leader_node_id (after get_leader_node), add an explicit check for a
missing/empty leader_node_id and return a postponed PluginOutput indicating "No
leader elected yet" (or similar), include node_properties=mark_node_ready() and
local_state so the node remains marked ready while retrying; keep existing logic
to compute is_leader and cluster initialization only after leader_node_id is
present.

In `@src/services/apps/nats/manifest.yaml`:
- Around line 20-29: The selection currently checks any node in $natsNodeIds
across all clusters, so scope the clusternode check to the candidate WireGuard
cluster ($wgCluster): when building the inner array use $swarmdb.clusternodes[]
| select(.deleted_ts == null and (.node | IN($natsNodeIds[])) and .cluster ==
$wgCluster.id) (or .cluster_id if that is the clusternode key) so only nodes
belonging to the evaluated $wgCluster count toward the length > 0 predicate.

In `@src/services/apps/test-app-route/manifest.yaml`:
- Around line 20-46: Select a single deterministic cluster instead of allowing
multiple matches and make node-id prefix checks exact: when assigning
$redisCluster and $wgCluster, constrain the cluster-selection expression to a
single result (e.g., pipe the matching sequence into first or use limit(1; ...))
so only one cluster is bound, and replace fragile startswith(...) comparisons
used later for matching cluster-node IDs with an exact delimiter-aware test (for
example, use (.cluster | split("-")[0] == $redisCluster.id) or a regex anchored
with the hyphen like test("^" + $redisCluster.id + "-")) to avoid prefix
collisions; apply this change to the expressions that produce $redisCluster,
$wgCluster and the node-property filters that previously used startswith().

In `@src/swarm-scripts/62.setup-knot.sh`:
- Around line 57-59: Replace the hardcoded domain passed to the swarm-cli call
by reading it from an environment variable (e.g. BASE_DOMAIN or
SWARM_BASE_DOMAIN) and use that variable in the --value argument to SwarmSecrets
base_domain; if you require a fail-fast behavior ensure the script validates the
variable (and exits with a clear error) before invoking python3 "$(dirname
"$0")/swarm-cli.py" create SwarmSecrets base_domain --value=..., otherwise
provide a sensible safe default via ${BASE_DOMAIN:-your-default} so other
environments can override it without editing the script.
- Line 58: Script 62 is the only provisioning script that passes DB_PASSWORD to
the swarm-cli.py invocation (DB_PASSWORD="${DB_PASSWORD-}") while other scripts
and earlier calls in this same file omit it; remove the
DB_PASSWORD="${DB_PASSWORD-}" token from the swarm-cli.py environment assignment
in script 62 so the call matches the other provisioning scripts, and ensure you
adjust surrounding whitespace/line continuation (backslashes) so the command
remains syntactically correct.

In `@src/swarm-scripts/swarm-cli.py`:
- Around line 69-77: The DSN built in create_engine_from_env is vulnerable to
breaking when user/password contain URL-reserved characters; replace the
f-string construction with SQLAlchemy's URL.create to safely percent-encode
credentials and other parts: import and call
sqlalchemy.engine.URL.create(drivername="mysql+pymysql", username=user,
password=password or None, host=host, port=port, database=name,
query={"charset":"utf8mb4"}) and pass that URL object into create_engine(...,
pool_pre_ping=True, future=True) instead of the manual dsn string.

♻️ Duplicate comments (1)

src/Dockerfile (1)

251-251: Digest-pin the external image reference

Same concern as the previous review: the COPY --from=ghcr.io/...:build-18725490828 reference is mutable and should be pinned to a digest for reproducibility and supply‑chain safety.

🟡 Minor comments (14)

src/rootfs/files/scripts/setup_runtime_tools.sh-6-14 (1)
6-14: Guard required OUTPUTDIR under strict mode.

With set -euo pipefail, any unset OUTPUTDIR will abort at line 17 with an unclear error. Add an explicit check so failures are immediate with a clear message.
🔧 Suggested guard
 # private
 BUILDROOT="/buildroot"
+
+: "${OUTPUTDIR:?OUTPUTDIR must be set before running this script}"
src/rootfs/files/scripts/install_cockroachdb.sh-7-14 (1)
7-14: Add an explicit OUTPUTDIR guard for clearer failure.
With set -u, a missing OUTPUTDIR will terminate the script with a generic “unbound variable” error. Add a guard to surface a clear, actionable message before any chroot work.
✅ Suggested guard
 # init logging
 source "$BUILDROOT/files/scripts/log.sh"

 # chroot functions
 source "$BUILDROOT/files/scripts/chroot.sh"
+
+: "${OUTPUTDIR:?OUTPUTDIR must be set to the target rootfs path}"
src/rootfs/files/configs/usr/local/lib/swarm-cloud-api/schema-sync-4-6 (1)
4-6: Fix the path mentioned in the comment.

The comment references /opt/swarm-cloud-api, but the script actually uses /usr/local/lib/swarm-cloud. This can mislead operators.
📝 Comment correction
-# /opt/swarm-cloud-api monorepo dist directory.
+# /usr/local/lib/swarm-cloud monorepo dist directory.
src/rootfs/files/configs/nvidia-persistenced.service-6-12 (1)
6-12: Add a condition for the nvidia-persistenced binary.

ExecStartPre requires /usr/bin/nvidia-persistenced, but the unit only checks for nvidia-smi. If the daemon binary is missing, the service will hard-fail instead of being cleanly skipped.
🔧 Suggested guard
 ConditionPathExists=/dev/nvidiactl
 ConditionPathExists=/usr/bin/nvidia-smi
+ConditionPathExists=/usr/bin/nvidia-persistenced
src/rootfs/files/configs/usr/local/bin/swarm-services.sh-22-24 (1)
22-24: Silent timeout may cause confusing downstream failures.

When MySQL wait times out, the script silently continues and proceeds to run launcher scripts. If those scripts depend on MySQL being available, they'll fail with potentially confusing errors. Consider logging a warning or failing explicitly.
Suggested improvement
   if [ "$elapsed" -ge "$wait_timeout" ]; then
+    echo "[warning] MySQL at $mysql_host:$mysql_port not reachable after ${wait_timeout}s" >&2
     break
   fi
src/rootfs/files/configs/etc/systemd/system/swarm-services.service-8-14 (1)
8-14: RemainAfterExit=no may cause dependency issues for dependent services.

With Type=oneshot and RemainAfterExit=no, this service will transition to "inactive" as soon as the script completes. If other services depend on swarm-services.service being "active", they may not start correctly or may start prematurely.

For a setup/orchestration service that signals "setup complete", RemainAfterExit=yes is typical so the service remains in "active" state after successful execution.
Suggested fix
 [Service]
 Type=oneshot
 WorkingDirectory=/
 ExecStart=/usr/local/bin/swarm-services.sh
 StandardOutput=append:/var/log/swarm-services.log
 StandardError=append:/var/log/swarm-services-err.log
-RemainAfterExit=no
+RemainAfterExit=yes
src/rootfs/files/scripts/install_nats.sh-40-56 (1)

40-56: Remove wg-quick.target dependency from the nats-server systemd unit.

The systemd unit references wg-quick.target (in both After and Wants directives), but this target does not exist in the deployment. The Dockerfile explicitly removes wg-quick@wg0.service, indicating WireGuard is not part of the system. While Wants prevents startup failure, this will cause systemd warnings about the missing target. Either remove this dependency or ensure WireGuard is part of the deployment.
src/swarm-scripts/64.setup-swarm-cloud-api.sh-18-21 (1)
18-21: Add DB_PASSWORD variable definition and pass it to swarm-cli.py calls.

The script omits DB_PASSWORD entirely, while swarm-cli.py expects it (line 351 reads os.environ.get("DB_PASSWORD", "")). The related script 62.setup-knot.sh correctly defines and passes DB_PASSWORD="${DB_PASSWORD-}" to every swarm-cli.py invocation. This script should follow the same pattern for consistency and to support password-protected database connections.
Required changes

Add the variable definition at line 22:
 DB_HOST=${DB_HOST:-127.0.0.1}
 DB_PORT=${DB_PORT:-3306}
 DB_USER=${DB_USER:-root}
+DB_PASSWORD=${DB_PASSWORD:-}
 DB_NAME=${DB_NAME:-swarmdb}
Then add DB_PASSWORD to all swarm-cli.py environment variable passes (lines 42-43, 47-48, 52-53, 57-58), for example:
-if DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" \
+if DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_PASSWORD="$DB_PASSWORD" DB_NAME="$DB_NAME" \
src/services/apps/mongodb/manifest.yaml-57-65 (1)
57-65: Tighten WireGuard cluster_node prefix matching to prevent ID collisions.

startswith($wgCluster.id) can match unrelated clusters when IDs share prefixes (e.g., wireguard vs wireguard-2). Other cluster ID patterns in this file use startswith("{{ clusterId }}:") with a delimiter; align with this convention to ensure exact cluster matching.
🛠️ Suggested fix (delimiter-aware prefix)
-          (.cluster_node | startswith($wgCluster.id)) and
+          (.cluster_node | startswith($wgCluster.id + ":")) and
src/services/apps/test-app-route/manifest.yaml-64-86 (1)
64-86: Add delimiter to cluster_node prefix filter for redisCluster.

The startswith($redisCluster.id) filter can incorrectly match unrelated clusters when IDs share prefixes (e.g., "redis" matches "redis-prod"). Other manifest files (mongodb.yaml, nats.yaml) use the delimiter pattern startswith("{{ clusterId }}:") for similar filtering. Apply the same pattern here:
Suggested fix
-          (.cluster_node | startswith($redisCluster.id)) and
+          (.cluster_node | startswith($redisCluster.id + ":")) and
src/services/apps/nats/manifest.yaml-59-64 (1)
59-64: Avoid prefix collisions for wgNodeProperties

cluster_node looks to follow <clusterId>:<nodeId> (see the NATS block using "{{ clusterId }}:"). Using startswith($wgCluster.id) can match unintended clusters with shared prefixes. Align the prefix delimiter.
🔧 Proposed fix
-          (.cluster_node | startswith($wgCluster.id)) and
+          (.cluster_node | startswith($wgCluster.id + ":")) and
src/services/apps/mongodb/main.py-233-240 (1)
233-240: Ensure local_state is always a dict

Other handlers default local_state to {}; here it may be None, which can cause inconsistent downstream handling.
🔧 Proposed fix
 def handle_init(input_data: PluginInput) -> PluginOutput:
-    try:
+    local_state = input_data.local_state or {}
+    try:
         if not is_mongo_available():
             install_mongodb()
         MONGO_LOG_DIR.mkdir(parents=True, exist_ok=True)
-        return PluginOutput(status="completed", local_state=input_data.local_state)
+        return PluginOutput(status="completed", local_state=local_state)
     except Exception as e:
-        return PluginOutput(status="error", error_message=str(e), local_state=input_data.local_state)
+        return PluginOutput(status="error", error_message=str(e), local_state=local_state)
src/services/apps/test-app/main.py-314-319 (1)
314-319: Add daemon-reload after removing the systemd unit file

After deleting the unit file, systemd must be notified via daemon-reload to unregister the unit and clear cached metadata. Without it, subsequent provisioning or status checks may fail or behave inconsistently. The disable operation may implicitly reload on some systems, but explicitly reloading after file deletion is the most reliable approach.
🔧 Proposed fix
         if SYSTEMD_SERVICE_PATH.exists():
             SYSTEMD_SERVICE_PATH.unlink()
+            subprocess.run(["systemctl", "daemon-reload"], check=False)
src/services/apps/mongodb/main.py-308-345 (1)
308-345: Handle already-initialized replica sets to avoid infinite postponement

If mongodb_rs_initialized is missing or stale but the replica set is already initialized, rs_initiate() returns ok=0 with error code 23 ("already initialized") and the handler remains in postponed status indefinitely. Detect this by probing rs.status() when rs_initiate() fails—if it returns ok=1, mark the replica set as initialized to allow convergence.
🔧 Proposed fix
         # Initiate replica set (single or multi-node)
         if rs_initiate(local_tunnel_ip, desired_hosts):
             # Give it a moment to elect primary
             time.sleep(3)
             done_props = {"mongodb_rs_initialized": "true", **node_ready_props}
             return PluginOutput(status="completed", node_properties=done_props, local_state=local_state)
         else:
+            st, _ = rs_status(local_tunnel_ip)
+            if st and st.get("ok") == 1:
+                done_props = {"mongodb_rs_initialized": "true", **node_ready_props}
+                return PluginOutput(status="completed", node_properties=done_props, local_state=local_state)
             return PluginOutput(status="postponed", error_message="Failed to initiate replica set", node_properties=node_ready_props, local_state=local_state)

🧹 Nitpick comments (40)

src/rootfs/files/configs/cert/superprotocol-certs.service (1)
4-5: Consider removing commented-out dependencies or adding context.

Commented-out configuration lines are generally a maintenance burden. If the rke2-server.service dependency is no longer needed as part of the Swarm integration, consider removing these lines entirely. If they're kept intentionally for reference during the transition, a brief comment explaining why would help future maintainers.
Option 1: Remove the commented lines
 [Unit]
 Description=SuperProtocol add trusted self-signed certificate
 After=multi-user.target
-# Wants=rke2-server.service
-# After=rke2-server.service

 [Service]
Option 2: Add context if kept intentionally
 [Unit]
 Description=SuperProtocol add trusted self-signed certificate
 After=multi-user.target
+# NOTE: RKE2 dependencies removed as part of Swarm migration
 # Wants=rke2-server.service
 # After=rke2-server.service
.gitignore (1)
145-146: Optional: Remove the redundant comment line.

The .DS_Store pattern on line 146 is self-explanatory and correctly ignores macOS Finder metadata files. The comment on line 145 is redundant and could be removed for consistency with the rest of the file.
♻️ Proposed cleanup
-#.DS_Store
 .DS_Store
src/rootfs/files/scripts/cleanup_rootfs.sh (1)
29-29: Hardcoded kernel version may become stale.

The specific kernel headers version 6.12.13-nvidia-gpu-confidential will need manual updating whenever the kernel is upgraded. Consider dynamically discovering the installed headers package name to avoid maintenance drift.
💡 Suggested approach
-            linux-headers-6.12.13-nvidia-gpu-confidential \
+            $(dpkg -l | grep -oP "linux-headers-\S+-nvidia-gpu-confidential" | head -1) \
Alternatively, if this version is intentionally pinned to a specific build, add a comment explaining why and where the version is tracked.
src/rootfs/files/scripts/install_rke2.sh (2)
19-28: Consider version‑pinning the installer source for reproducible builds.

Even with RKE2_INSTALL_SHA256, https://get.rke2.io always serves “latest,” so builds will break whenever the upstream script changes unless the SHA is updated. A versioned URL or vendored installer script would make upgrades explicit and reproducible.

If you want a quick check step, you can add a CI job or run a local script to compare the current upstream SHA against your configured RKE2_INSTALL_SHA256 before releases.

38-42: Make alias injection idempotent to avoid duplicate lines.

If this script can run more than once, /etc/profile will accumulate duplicate alias entries. Consider guarding with a marker or grep check.
♻️ Suggested change
 function add_aliases() {
     log_info "adding kubectl aliases"
-    echo "export KUBECONFIG=/etc/rancher/rke2/rke2.yaml" >> "$OUTPUTDIR/etc/profile";
-    echo "alias k='/usr/local/bin/kubectl'" >> "$OUTPUTDIR/etc/profile";
-    echo "alias kubectl='/usr/local/bin/kubectl'" >> "$OUTPUTDIR/etc/profile";
+    if ! grep -q "KUBECONFIG=/etc/rancher/rke2/rke2.yaml" "$OUTPUTDIR/etc/profile"; then
+        cat >> "$OUTPUTDIR/etc/profile" <<'EOF'
+export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
+alias k='/usr/local/bin/kubectl'
+alias kubectl='/usr/local/bin/kubectl'
+EOF
+    fi
 }
src/rootfs/files/scripts/setup_runtime_tools.sh (2)
16-24: Remove policy-rc.d after package installation.

Keeping it in the final rootfs can block service starts for future in-VM package installs. Consider removing it once installs are done.
♻️ Proposed follow-up
     chroot "${OUTPUTDIR}" /usr/bin/apt install -y --no-install-recommends \
         mysql-client python3 python3-pip redis-server redis-tools openssl netcat-openbsd dnsutils
     chroot "${OUTPUTDIR}" /usr/bin/apt clean
+
+    log_info "removing policy-rc.d after package installation"
+    rm -f "${OUTPUTDIR}/usr/sbin/policy-rc.d"
20-24: Prefer apt-get + noninteractive env in scripts.

apt is not considered stable for scripting and can prompt. Use apt-get with DEBIAN_FRONTEND=noninteractive to avoid hangs in CI/build.
🔧 Suggested change
-    chroot "${OUTPUTDIR}" /usr/bin/apt update
-    chroot "${OUTPUTDIR}" /usr/bin/apt install -y --no-install-recommends \
+    chroot "${OUTPUTDIR}" /usr/bin/env DEBIAN_FRONTEND=noninteractive apt-get update
+    chroot "${OUTPUTDIR}" /usr/bin/env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
         mysql-client python3 python3-pip redis-server redis-tools openssl netcat-openbsd dnsutils
-    chroot "${OUTPUTDIR}" /usr/bin/apt clean
+    chroot "${OUTPUTDIR}" /usr/bin/env DEBIAN_FRONTEND=noninteractive apt-get clean
src/rootfs/files/configs/usr/local/bin/swarm-node.sh (1)
1-7: Make node and working-dir assumptions explicit.

Systemd environments can have a minimal PATH; if node or the directory is missing, the failure will be less clear. Consider using an absolute node path and simple preflight checks.
♻️ Optional hardening
 set -euo pipefail
-cd /usr/local/lib/swarm-cloud
-exec node ./apps/swarm-node/dist/main.js
+NODE_BIN=/usr/bin/node
+[ -x "$NODE_BIN" ] || { echo "node not found at $NODE_BIN" >&2; exit 127; }
+cd /usr/local/lib/swarm-cloud || { echo "missing /usr/local/lib/swarm-cloud" >&2; exit 1; }
+exec "$NODE_BIN" ./apps/swarm-node/dist/main.js
src/rootfs/files/configs/usr/local/lib/swarm-cloud-api/swarm-cloud-api (1)
1-5: Consider explicit node path and directory checks.

For consistency with systemd’s minimal PATH, it’s safer to use an absolute node path and a clear error if the directory is missing.
♻️ Optional hardening
 set -euo pipefail
-cd /usr/local/lib/swarm-cloud
-exec node ./apps/swarm-cloud-api/dist/main.js "$@"
+NODE_BIN=/usr/bin/node
+[ -x "$NODE_BIN" ] || { echo "node not found at $NODE_BIN" >&2; exit 127; }
+cd /usr/local/lib/swarm-cloud || { echo "missing /usr/local/lib/swarm-cloud" >&2; exit 1; }
+exec "$NODE_BIN" ./apps/swarm-cloud-api/dist/main.js "$@"
src/rootfs/files/scripts/install_mongodb.sh (1)
15-26: Consider consolidating chroot commands and adding error context.

A few observations:

Multiple separate chroot invocations add overhead. Consider consolidating into a single heredoc-based chroot call for efficiency.

The || true on line 25 silently ignores chown failures. If the mongodb user/group doesn't exist (e.g., package install failed silently), this could lead to runtime permission issues.
♻️ Optional: Consolidated chroot approach
 function install_mongodb() {
     log_info "installing MongoDB (mongodb-org 7.0) inside VM rootfs"
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; apt update'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; apt install -y --no-install-recommends gnupg curl'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor -o /usr/share/keyrings/mongodb-server-7.0.gpg'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" > /etc/apt/sources.list.d/mongodb-org-7.0.list'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; apt update'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; apt install -y --no-install-recommends mongodb-org mongodb-mongosh'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; apt clean'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; mkdir -p /var/lib/mongodb /var/log/mongodb'
-    chroot "$OUTPUTDIR" /bin/bash -lc 'set -e; chown -R mongodb:mongodb /var/lib/mongodb /var/log/mongodb || true'
+    chroot "$OUTPUTDIR" /bin/bash -l <<'EOF'
+set -e
+apt update
+apt install -y --no-install-recommends gnupg curl
+curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | gpg --dearmor -o /usr/share/keyrings/mongodb-server-7.0.gpg
+echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" > /etc/apt/sources.list.d/mongodb-org-7.0.list
+apt update
+apt install -y --no-install-recommends mongodb-org mongodb-mongosh
+apt clean
+mkdir -p /var/lib/mongodb /var/log/mongodb
+chown -R mongodb:mongodb /var/lib/mongodb /var/log/mongodb
+EOF
 }
src/rootfs/files/scripts/install_knot.sh (2)

15-31: LGTM with minor observations.

The installation logic is correct and follows the established pattern. A few minor observations:

Inconsistency: This script uses apt-get while install_mongodb.sh uses apt. Both work, but consistency aids maintainability.

No version pinning: Unlike MongoDB (7.0), Knot DNS version is unpinned. This could lead to unexpected upgrades breaking compatibility.

These are minor and don't block the PR.

33-35: Same cleanup consideration as MongoDB script.

If install_knot fails, chroot_deinit won't be called due to set -e. Consider using a trap for cleanup consistency across all install scripts.
src/rootfs/files/scripts/install_provision_plugin_sdk.sh (2)
15-19: Complex one-liner reduces maintainability.

The inline bash on line 18 is dense and hard to debug. Consider using a heredoc or multi-line approach for clarity.

Also, if neither wheels nor source directory exist, the script will fail with an unhelpful pip error. Consider adding explicit validation.
♻️ Proposed refactor for readability
 function install_provision_sdk() {
     log_info "installing provision-plugin-sdk into VM Python environment"
-    # Install from prebuilt wheels if present; otherwise from source inside rootfs
-    chroot "$OUTPUTDIR" /bin/bash -lc 'shopt -s nullglob; set -e; files=(/opt/provision-wheels/*.whl); if (( ${`#files`[@]} )); then python3 -m pip install --break-system-packages "${files[@]}"; else echo "No wheels found, installing from source"; python3 -m pip install --break-system-packages /etc/swarm-cloud/provision-plugin-sdk; fi'
+    chroot "$OUTPUTDIR" /bin/bash -l <<'EOF'
+set -e
+shopt -s nullglob
+wheels=(/opt/provision-wheels/*.whl)
+
+if (( ${`#wheels`[@]} )); then
+    echo "Installing from prebuilt wheels..."
+    python3 -m pip install --break-system-packages "${wheels[@]}"
+elif [ -d /etc/swarm-cloud/provision-plugin-sdk ]; then
+    echo "No wheels found, installing from source..."
+    python3 -m pip install --break-system-packages /etc/swarm-cloud/provision-plugin-sdk
+else
+    echo "ERROR: No wheels in /opt/provision-wheels/ and no source in /etc/swarm-cloud/provision-plugin-sdk" >&2
+    exit 1
+fi
+EOF
 }
21-23: Same trap-based cleanup suggestion applies.

For consistency with the other install scripts, consider using a trap to ensure chroot_deinit runs on failure.
src/rootfs/files/scripts/install_nodejs.sh (2)
6-7: Consider validating OUTPUTDIR before use.

The script documents OUTPUTDIR as a required public variable but doesn't validate its presence. If unset, the script will fail with cryptic errors deep in chroot operations.
Suggested validation
 # public, required
 # OUTPUTDIR
+
+if [ -z "${OUTPUTDIR:-}" ]; then
+    log_fail "OUTPUTDIR is required but not set";
+    exit 1;
+fi
26-26: Declare and assign separately to avoid masking return values.

Combining local declaration with command substitution masks the exit status of the subshell. If the chroot command fails for a reason other than node not being installed, the failure is silently ignored.
Proposed fix
-    local NODE_VERSION=$(chroot "${OUTPUTDIR}" /bin/bash -c 'node --version' 2>/dev/null || true);
+    local NODE_VERSION;
+    NODE_VERSION=$(chroot "${OUTPUTDIR}" /bin/bash -c 'node --version' 2>/dev/null || true);
src/rootfs/files/configs/usr/local/bin/swarm-services.sh (1)
34-35: Use mapfile for safer array assignment.

Word splitting on command output can cause issues with filenames containing spaces or special characters. Using mapfile is safer and avoids the shellcheck warning.
Proposed fix
-IFS=$'\n' sorted=($(printf "%s\n" "${scripts[@]}" | sort))
-unset IFS
+mapfile -t sorted < <(printf "%s\n" "${scripts[@]}" | sort)
src/rootfs/files/configs/etc/systemd/system/swarm-db.service (1)
12-12: Use absolute path for mkdir in ExecStartPre.

Systemd may not have standard paths like /bin or /usr/bin in its execution environment depending on the context. Using an absolute path ensures reliability.
Proposed fix
-ExecStartPre=mkdir -p /var/lib/swarm-db/data
+ExecStartPre=/bin/mkdir -p /var/lib/swarm-db/data
src/rootfs/files/configs/etc/systemd/system/swarm-node.service (2)
1-6: Consider adding ConditionPathExists guards for consistency.

Unlike swarm-db.service, this unit lacks ConditionPathExists checks for the startup script and config file. Adding these guards would provide clearer failure messages when prerequisites are missing and maintain consistency across swarm services.
Suggested addition
 [Unit]
 Description=Swarm Node service
 After=network-online.target swarm-db.service
 Wants=network-online.target
 Requires=swarm-db.service
+ConditionPathExists=/usr/local/bin/swarm-node.sh
+ConditionPathExists=/sp/swarm/api.yaml
10-10: Use absolute path for mkdir in ExecStartPre.

Same as noted for swarm-db.service - use an absolute path for reliability.
Proposed fix
-ExecStartPre=mkdir -p /var/lib/swarm-cloud-api/data
+ExecStartPre=/bin/mkdir -p /var/lib/swarm-cloud-api/data
src/swarm-scripts/test/nats_js_sync_test.py (1)

26-30: Consider removing unused noqa directives.

Ruff reports these noqa: E402 directives as unused because the E402 rule isn't enabled. You can either remove the directives or add # noqa: E402 comments without the type hints if you want to document the intentional late imports for readers.
src/rootfs/files/scripts/install_pccs.sh (2)
118-124: Declare and assign separately to avoid masking return values.

The shellcheck warning SC2155 indicates that combining local declaration with command substitution masks the command's return value. While you have || true as a fallback, separating declaration and assignment is cleaner.
Suggested fix
         # Find node binary path
-        local NODE_PATH=$(chroot "${OUTPUTDIR}" /bin/bash -c 'which node' 2>/dev/null || true)
+        local NODE_PATH
+        NODE_PATH=$(chroot "${OUTPUTDIR}" /bin/bash -c 'which node' 2>/dev/null || true)
22-22: Minor typo in comment.

loggggging should be logging.
src/services/apps/test-app-route/main.py (2)
149-176: Consider adding retry logic for consistency with ensure_route_in_redis.

Unlike ensure_route_in_redis, this function has no retry mechanism. While cleanup operations often don't need retries (if the route doesn't exist, that's acceptable), transient network issues during cluster operations could still cause failures. Consider if retry logic would be beneficial here for consistency.

372-390: handle_destroy ignores errors from delete_route_from_redis.

The delete operation's success/error result is not checked. If the deletion fails (e.g., network issues), the destroy still returns "completed" status with the property cleared. This could leave orphaned routes in Redis.

Consider whether to propagate the error or log it more explicitly:
Optional: Log or handle delete failures
     # Only leader attempts to clean up the route
     if isinstance(state_json, dict) and is_local_node_leader(local_node_id, state_json):
-        delete_route_from_redis(state_json)
+        success, error = delete_route_from_redis(state_json)
+        if not success:
+            print(f"[!] Warning: Failed to delete route during destroy: {error}", file=sys.stderr)
+            # Optionally return error status if cleanup is critical
src/rootfs/files/configs/etc/systemd/system/swarm-cloud-api.service (1)
1-18: Service unit looks correct; consider adding restart rate limiting.

The dependency ordering is correct with Requires=swarm-db.service ensuring the database is running. However, with Restart=always, if the service enters a crash loop, systemd will keep restarting it indefinitely.

Consider adding rate limiting to prevent excessive restart attempts:
♻️ Suggested addition for restart rate limiting
 Restart=always
 RestartSec=5
+StartLimitIntervalSec=300
+StartLimitBurst=5
 Environment=NODE_ENV=production
src/rootfs/files/configs/usr/local/bin/swarm-cloud-api.sh (1)
7-14: Improve error messaging for missing entrypoint when Node.js is available.

If node is available but main.js doesn't exist, the script falls through to the generic error message. This could make debugging harder since the actual issue (missing dist file) isn't clearly communicated.
♻️ Proposed improvement for clearer error handling
 if command -v node >/dev/null 2>&1; then
   if [[ -f "./apps/swarm-cloud-api/dist/main.js" ]]; then
     exec node ./apps/swarm-cloud-api/dist/main.js
+  else
+    echo "swarm-cloud-api: Node.js found but dist/main.js missing at ./apps/swarm-cloud-api/dist/main.js" >&2
+    exit 1
   fi
+else
+  echo "swarm-cloud-api: Node.js not found in PATH" >&2
+  exit 1
 fi
-
-echo "swarm-cloud-api: no runnable entrypoint found (binary or NodeJS dist)" >&2
-exit 1
src/swarm-scripts/61.setup-cockroachdb.sh (1)

14-55: Consider extracting common swarm bootstrap logic into a shared helper.

These setup scripts repeat the same env defaults, manifest check, and get/create flow; a small shared helper sourced by each script would reduce drift across services.
src/swarm-scripts/10.setup-wireguard.sh (2)
4-5: Outdated comment: script uses swarm-cli.py, not mysql client.

The comment states "via mysql client," but the script actually invokes swarm-cli.py for all database operations. Consider updating the comment for accuracy.

16-16: Unused variable: CLUSTER_ID is defined but never referenced.

This variable is set but not used anywhere in the script. Either remove it or use it where appropriate.
Proposed fix
-CLUSTER_ID=${CLUSTER_ID:-wireguard}
src/swarm-scripts/50.setup-rke2.sh (2)

4-5: Outdated comment: script uses swarm-cli.py, not mysql client.

Same as other scripts—the comment references "mysql client," but operations are performed via swarm-cli.py.

24-24: Unused variable: CLUSTER_ID is defined but never referenced.

Consider removing this dead code for clarity.

src/swarm-scripts/40.setup-geo-ip-measurement.sh (2)

4-5: Outdated comment: script uses swarm-cli.py, not mysql client.

The comment mentions "mysql client," but this script uses swarm-cli.py for database operations.

24-24: Unused variable: CLUSTER_ID is defined but never referenced.

This variable can be removed as it serves no purpose in the current implementation.

src/swarm-scripts/20.setup-hw-measurement.sh (2)

4-5: Outdated comment: script uses swarm-cli.py, not mysql client.

Update the comment to accurately reflect that swarm-cli.py is used for database operations.

24-24: Unused variable: CLUSTER_ID is defined but never referenced.

Remove this dead code for consistency with the actual implementation.

src/swarm-scripts/62.setup-knot.sh (2)

4-5: Outdated comment: script uses swarm-cli.py, not mysql client.

The comment mentions "mysql client," but operations use swarm-cli.py.

23-23: Unused variable: CLUSTER_ID is defined but never referenced.

Remove this dead code.
src/Dockerfile (1)
339-340: Harden chmod to avoid glob failures

If any service directory lacks main.py, the glob can fail the build. Using find makes this robust and self-documenting.
♻️ Proposed refactor
-RUN chmod +x ${OUTPUTDIR}/etc/swarm-services/*/main.py
-RUN chmod +x ${OUTPUTDIR}/etc/swarm-cloud/services/*/main.py
+RUN find ${OUTPUTDIR}/etc/swarm-services -type f -name 'main.py' -exec chmod +x {} +
+RUN find ${OUTPUTDIR}/etc/swarm-cloud/services -type f -name 'main.py' -exec chmod +x {} +
src/services/apps/nats/main.py (1)
239-242: Remove unused needs_restart to avoid dead code.
♻️ Proposed cleanup
-    active, _ = is_service_active(NATS_SERVICE_NAME)
-    needs_restart = not active
+    active, _ = is_service_active(NATS_SERVICE_NAME)

coderabbitai · 2026-01-16T14:21:37Z

src/rootfs/files/configs/etc/systemd/system/ensure-kubectl-link.service

+[Unit]
+Description=Ensure kubectl link in RKE2 bin after /var mount
+After=local-fs.target
+ConditionPathExists=/usr/local/bin/kubectl
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/bin/ensure-kubectl-link.sh
+
+[Install]
+WantedBy=multi-user.target


⚠️ Potential issue | 🟠 Major

ConditionPathExists may prevent the service from running when needed.

The condition checks for /usr/local/bin/kubectl, but this is the symlink that the script is supposed to create. On a fresh system where RKE2 installs kubectl to /var/lib/rancher/rke2/bin/kubectl, the symlink won't exist yet, so the service will be skipped and never create it.

If the intent is to run only when RKE2's kubectl is available, the condition should check the source binary instead:

Proposed fix

[Unit] Description=Ensure kubectl link in RKE2 bin after /var mount After=local-fs.target -ConditionPathExists=/usr/local/bin/kubectl +ConditionPathExists=/var/lib/rancher/rke2/bin/kubectl [Service] Type=oneshot ExecStart=/usr/local/bin/ensure-kubectl-link.sh [Install] WantedBy=multi-user.target

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[Unit]

Description=Ensure kubectl link in RKE2 bin after /var mount

After=local-fs.target

ConditionPathExists=/usr/local/bin/kubectl

[Service]

Type=oneshot

ExecStart=/usr/local/bin/ensure-kubectl-link.sh

[Install]

WantedBy=multi-user.target

[Unit]

Description=Ensure kubectl link in RKE2 bin after /var mount

After=local-fs.target

ConditionPathExists=/var/lib/rancher/rke2/bin/kubectl

[Service]

Type=oneshot

ExecStart=/usr/local/bin/ensure-kubectl-link.sh

[Install]

WantedBy=multi-user.target

🤖 Prompt for AI Agents

In `@src/rootfs/files/configs/etc/systemd/system/ensure-kubectl-link.service` around lines 1 - 11, The unit uses ConditionPathExists=/usr/local/bin/kubectl which prevents the service from running because /usr/local/bin/kubectl is the symlink the script ensure-kubectl-link.sh must create; update the unit to check the real RKE2 binary (e.g. ConditionPathExists=/var/lib/rancher/rke2/bin/kubectl) or remove the ConditionPathExists line entirely so ExecStart=/usr/local/bin/ensure-kubectl-link.sh runs when local-fs.target is reached and can create the symlink; ensure references to ConditionPathExists and ensure-kubectl-link.sh are the ones you modify.

coderabbitai · 2026-01-16T14:21:37Z

src/rootfs/files/configs/usr/local/bin/hardening-vm.sh

+# Allow DNS requests
+iptables -A INPUT -p udp --dport 53 -j ACCEPT
+iptables -A INPUT -p udp --sport 53 -j ACCEPT


⚠️ Potential issue | 🟠 Major

Potential security issue with --sport 53 rule.

The rule iptables -A INPUT -p udp --sport 53 -j ACCEPT allows any incoming UDP packet with source port 53. This is problematic because:

Source ports can be trivially spoofed by attackers

This effectively bypasses the firewall for any UDP traffic claiming to originate from port 53

If the intent is to allow DNS responses, the ESTABLISHED,RELATED rule on line 9 should already handle legitimate DNS response traffic.

🔒 Recommended fix: Remove the sport 53 rule

# Allow DNS requests iptables -A INPUT -p udp --dport 53 -j ACCEPT -iptables -A INPUT -p udp --sport 53 -j ACCEPT

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Allow DNS requests

iptables -A INPUT -p udp --dport 53 -j ACCEPT

iptables -A INPUT -p udp --sport 53 -j ACCEPT

# Allow DNS requests

iptables -A INPUT -p udp --dport 53 -j ACCEPT

🤖 Prompt for AI Agents

In `@src/rootfs/files/configs/usr/local/bin/hardening-vm.sh` around lines 15 - 17, The rule in hardening-vm.sh that appends "iptables -A INPUT -p udp --sport 53 -j ACCEPT" is insecure because source ports can be spoofed; remove that line and rely on the existing ESTABLISHED,RELATED accept rule (or explicitly allow UDP responses by matching connection state) instead of permitting packets by source port; update or remove the iptables command referencing "--sport 53" in the script to close this hole.

coderabbitai · 2026-01-16T14:21:37Z

src/rootfs/files/scripts/build_swarm_cloud.sh

+chroot_init;
+build_swarm_cloud;
+chroot_deinit;


⚠️ Potential issue | 🟠 Major

Ensure chroot cleanup on failure (missing trap).

With set -e, any failure in build_swarm_cloud exits before chroot_deinit, leaving mounts behind. Add an EXIT trap so cleanup always runs.

🧩 Proposed fix

-chroot_init; -build_swarm_cloud; -chroot_deinit; +trap 'chroot_deinit' EXIT +chroot_init; +build_swarm_cloud;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

chroot_init;

build_swarm_cloud;

chroot_deinit;

trap 'chroot_deinit' EXIT

chroot_init;

build_swarm_cloud;

🤖 Prompt for AI Agents

In `@src/rootfs/files/scripts/build_swarm_cloud.sh` around lines 75 - 77, The script currently calls chroot_init; build_swarm_cloud; chroot_deinit; but with set -e a failure in build_swarm_cloud will skip chroot_deinit; add an EXIT trap to always call chroot_deinit on script exit (and optionally restore previous trap or disable it after successful deinit) so mounts are cleaned even on errors; place the trap definition before calling chroot_init (e.g., trap 'chroot_deinit' EXIT) and ensure chroot_deinit is idempotent/safe to run multiple times.

coderabbitai · 2026-01-16T14:21:37Z

src/rootfs/files/scripts/install_cockroachdb.sh

+    chroot "$OUTPUTDIR" /bin/bash -lc '
+        set -e;
+        arch=$(uname -m);
+        case "$arch" in
+          x86_64) cr_arch=amd64 ;;
+          aarch64|arm64) cr_arch=arm64 ;;
+          *) cr_arch=amd64 ;;
+        esac;
+        cd /tmp;
+        wget -q https://binaries.cockroachdb.com/cockroach-latest.linux-${cr_arch}.tgz -O cockroach.tgz;
+        tar -xzf cockroach.tgz;
+        dir=$(tar -tzf cockroach.tgz | head -1 | cut -d/ -f1);
+        cp "$dir/cockroach" /usr/local/bin/cockroach;
+        chmod 0755 /usr/local/bin/cockroach;
+        rm -rf "$dir" cockroach.tgz;
+    '


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

CockroachDB official installation documentation download tarball version checksum verification

💡 Result:

Short answer — official docs: Cockroach provides SHA256 checksum files alongside tarball downloads and shows SHA256 hashes on the release pages. Verify by fetching the .sha256 (or the listed SHA256 string) from the Cockroach docs/releases page and comparing it to a locally computed SHA256 value.

Steps (Linux/macOS)

Download tarball and checksum (example names from the docs):
curl -O https://binaries.cockroachdb.com/cockroach-vX.Y.Z.linux-amd64.tgz
curl -O https://binaries.cockroachdb.com/cockroach-vX.Y.Z.linux-amd64.tgz.sha256

Verify with sha256sum:
sha256sum -c cockroach-vX.Y.Z.linux-amd64.tgz.sha256
—or— compute and compare:
sha256sum cockroach-vX.Y.Z.linux-amd64.tgz

If the output says "OK" (or the computed hash equals the official hash), the download integrity is verified.

Windows (PowerShell)

Get-FileHash .\cockroach-vX.Y.Z.windows-amd64.tgz -Algorithm SHA256

Compare the resulting Hash value to the official SHA256 in the Cockroach release page.

Where to find the official checksums

Release pages list tarballs and their SHA256 links/hashes. See CockroachDB Releases and individual release download entries. [1][2]

General install docs reference downloading the binaries and include links to downloads. [1]

Sources

Install CockroachDB (official docs). [1]

CockroachDB Releases / individual release pages (show tarball (SHA256) links). [2]

If you want, tell me which CockroachDB version and OS/arch and I’ll fetch the exact download URL and checksum and show the verification commands for that specific file.

🌐 Web query:

CockroachDB binaries.cockroachdb.com -latest download available architectures supported

💡 Result:

The official CockroachDB self-hosted downloads (binaries.cockroachdb.com -latest) provide Intel (x86_64 / amd64) and ARM64 (aarch64) builds. macOS and Windows builds are offered as experimental (not production-qualified), and Docker images are multi-platform (Intel + ARM). [1][2]

🌐 Web query:

CockroachDB official releases vX.Y.Z download URL format cockroach-vX.Y.Z.linux-.sha256 checksum file

💡 Result:

Official CockroachDB release file names follow this pattern (examples shown for Linux Intel/ARM):

Binary tarball: cockroach-vX.Y.Z.linux-amd64.tgz or cockroach-vX.Y.Z.linux-arm64.tgz. (cockroachlabs.com)

SQL-only tarball: cockroach-sql-vX.Y.Z.linux-amd64.tgz (or -arm64). (cockroachlabs.com)

The docs list a SHA256 checksum for each downloadable file (displayed as “(SHA256)” next to the filename); checksum files/links are provided on the release download entries. Use the exact filename shown on the release page and the accompanying SHA256 entry to verify the download. (cockroachlabs.com)

If you want, I can fetch the exact download + checksum link for a specific version (e.g., v25.4.2).

Citations:

1: https://www.cockroachlabs.com/docs/releases/v25.1?utm_source=openai

2: https://www.cockroachlabs.com/docs/releases/v25.1?utm_source=openai

3: https://www.cockroachlabs.com/docs/releases/v25.1?utm_source=openai

Pin version, verify checksum, and fail explicitly on unsupported architectures.

Pulling -latest tarballs without version pinning breaks reproducibility and increases supply-chain risk. Official CockroachDB guidance recommends versioned downloads (e.g., cockroach-vX.Y.Z.linux-amd64.tgz) with SHA256 verification using the checksums published on the release page. Additionally, silently defaulting to amd64 on unknown architectures risks installing an incompatible binary; only Intel (amd64) and ARM64 are officially supported.

Suggested hardening

chroot "$OUTPUTDIR" /bin/bash -lc ' set -e; arch=$(uname -m); case "$arch" in x86_64) cr_arch=amd64 ;; aarch64|arm64) cr_arch=arm64 ;; - *) cr_arch=amd64 ;; + *) echo "Unsupported arch: $arch" >&2; exit 1 ;; esac; + : "${CRDB_VERSION:?Set CockroachDB version (e.g., v25.1.0)}"; + : "${CRDB_SHA256:?Set CockroachDB tarball SHA256}"; cd /tmp; - wget -q https://binaries.cockroachdb.com/cockroach-latest.linux-${cr_arch}.tgz -O cockroach.tgz; + wget -q "https://binaries.cockroachdb.com/cockroach-${CRDB_VERSION}.linux-${cr_arch}.tgz" -O cockroach.tgz; + echo "${CRDB_SHA256} cockroach.tgz" | sha256sum -c -; tar -xzf cockroach.tgz; dir=$(tar -tzf cockroach.tgz | head -1 | cut -d/ -f1); cp "$dir/cockroach" /usr/local/bin/cockroach; chmod 0755 /usr/local/bin/cockroach; rm -rf "$dir" cockroach.tgz; '

🤖 Prompt for AI Agents

In `@src/rootfs/files/scripts/install_cockroachdb.sh` around lines 23 - 38, The install script currently downloads a non-pinned "cockroach-latest" tarball, skips checksum verification, and silently falls back to amd64 for unknown architectures; update the chroot block to (1) detect architecture using arch/variable cr_arch but instead of defaulting set supported values only (x86_64 -> amd64, aarch64|arm64 -> arm64) and explicitly exit with an error on unsupported architectures, (2) replace the wget URL to download a pinned release (e.g., cockroach-vX.Y.Z.linux-${cr_arch}.tgz) and (3) fetch and verify the published SHA256 checksum before extracting (use tar -xzf only after checksum validation), failing the script if the checksum does not match; ensure the cp to /usr/local/bin/cockroach and subsequent cleanup remain unchanged but only execute after successful verification.

coderabbitai · 2026-01-16T14:21:37Z

src/rootfs/files/scripts/install_cockroachdb.sh

+chroot_init
+install_cockroachdb
+chroot_deinit


⚠️ Potential issue | 🟠 Major

Ensure chroot_deinit runs on failure (mount leak risk).
If any step fails, set -e exits before cleanup, leaving sys/proc/tmp/dev mounted. A trap makes cleanup reliable.

🧹 Suggested cleanup trap

-chroot_init -install_cockroachdb -chroot_deinit +trap 'chroot_deinit' EXIT +chroot_init +install_cockroachdb

🤖 Prompt for AI Agents

In `@src/rootfs/files/scripts/install_cockroachdb.sh` around lines 43 - 45, The script currently calls chroot_init, install_cockroachdb, and chroot_deinit sequentially but can exit early on error and leave mounts under sys/proc/tmp/dev; add a trap to ensure chroot_deinit always runs on error or exit by registering trap 'chroot_deinit' for EXIT (and optionally ERR) at the top of the script after defining chroot_deinit and chroot_init so that regardless of failures in install_cockroachdb the cleanup runs; ensure the trap is installed before calling chroot_init/install_cockroachdb and that chroot_deinit is idempotent/safe to call multiple times.

coderabbitai · 2026-01-16T14:21:37Z

src/services/apps/nats/manifest.yaml

+    (
+      $swarmdb.clusters[] |
+      select(.cluster_policy == "wireguard" and .deleted_ts == null) |
+      select(
+        (
+          [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($natsNodeIds[])))] |
+          length > 0
+        )
+      )
+    ) as $wgCluster |


⚠️ Potential issue | 🟠 Major

Scope WireGuard cluster selection to the candidate cluster

The predicate currently checks for any node in $natsNodeIds without constraining the node to the wireguard cluster being evaluated, so every wireguard cluster can match once any NATS node exists anywhere. That can mis-associate WG data (or emit multiple outputs) in multi-WG environments.

🔧 Proposed fix

- ( - $swarmdb.clusters[] | - select(.cluster_policy == "wireguard" and .deleted_ts == null) | - select( - ( - [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($natsNodeIds[])))] | - length > 0 - ) - ) - ) as $wgCluster | + ( + $swarmdb.clusters[] | + select(.cluster_policy == "wireguard" and .deleted_ts == null) as $wg | + select( + ( + [$swarmdb.clusternodes[] | + select(.deleted_ts == null and .cluster == $wg.id and (.node | IN($natsNodeIds[])))] | + length > 0 + ) + ) + ) as $wgCluster |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(

$swarmdb.clusters[] |

select(.cluster_policy == "wireguard" and .deleted_ts == null) |

select(

(

[$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($natsNodeIds[])))] |

length > 0

)

)

) as $wgCluster |

(

$swarmdb.clusters[] |

select(.cluster_policy == "wireguard" and .deleted_ts == null) as $wg |

select(

(

[$swarmdb.clusternodes[] |

select(.deleted_ts == null and .cluster == $wg.id and (.node | IN($natsNodeIds[])))] |

length > 0

)

)

) as $wgCluster |

🤖 Prompt for AI Agents

In `@src/services/apps/nats/manifest.yaml` around lines 20 - 29, The selection currently checks any node in $natsNodeIds across all clusters, so scope the clusternode check to the candidate WireGuard cluster ($wgCluster): when building the inner array use $swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($natsNodeIds[])) and .cluster == $wgCluster.id) (or .cluster_id if that is the clusternode key) so only nodes belonging to the evaluated $wgCluster count toward the length > 0 predicate.

coderabbitai · 2026-01-16T14:21:38Z

src/services/apps/test-app-route/manifest.yaml

+    # Find Redis cluster that shares at least one node with this app cluster
+    (
+      $swarmdb.clusters[] |
+      select(.cluster_policy == "redis" and .deleted_ts == null) |
+      select(
+        (
+          [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($appNodeIds[])))] |
+          length > 0
+        )
+      )
+    ) as $redisCluster |
+
+    # Redis cluster nodes
+    ([$swarmdb.clusternodes[] | select(.cluster == $redisCluster.id and .deleted_ts == null)]) as $redisClusterNodes |
+    ($redisClusterNodes | map(.node)) as $redisNodeIds |
+
+    # Find WireGuard cluster that contains any Redis nodes
+    (
+      $swarmdb.clusters[] |
+      select(.cluster_policy == "wireguard" and .deleted_ts == null) |
+      select(
+        (
+          [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($redisNodeIds[])))] |
+          length > 0
+        )
+      )
+    ) as $wgCluster |


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the relevant section if [ -f "src/services/apps/test-app-route/manifest.yaml" ]; then echo "=== File exists, checking line count ===" wc -l src/services/apps/test-app-route/manifest.yaml echo "" echo "=== Lines 1-50 of the file ===" head -50 src/services/apps/test-app-route/manifest.yaml else echo "File not found, searching for similar files..." find . -name "manifest.yaml" -type f 2>/dev/null | head -20 fi

Repository: Super-Protocol/sp-vm

Length of output: 1613

🏁 Script executed:

# Search for any startswith usage in the file to find the prefix issue mentioned if [ -f "src/services/apps/test-app-route/manifest.yaml" ]; then echo "=== Searching for startswith in manifest.yaml ===" grep -n "startswith" src/services/apps/test-app-route/manifest.yaml || echo "No startswith found" fi

Repository: Super-Protocol/sp-vm

Length of output: 238

🏁 Script executed:

# Read the complete stateExpr to understand the full context sed -n '10,87p' src/services/apps/test-app-route/manifest.yaml

Repository: Super-Protocol/sp-vm

Length of output: 2539

🏁 Script executed:

# Let me verify jq's behavior with stream assignment by testing a simple case cat > /tmp/test_jq.json <<'EOF' { "items": [ {"id": "a", "value": 1}, {"id": "b", "value": 2}, {"id": "c", "value": 3} ] } EOF echo "=== Testing jq stream assignment behavior ===" echo "Test 1: Multiple values with 'as' assignment (should get last value)" jq '.items[] | select(.value > 0) as $item | $item' /tmp/test_jq.json echo "" echo "Test 2: Multiple values without assignment (should get all values)" jq '.items[] | select(.value > 0)' /tmp/test_jq.json echo "" echo "Test 3: How many times does 'as' assignment execute?" jq '[.items[] | select(.value > 0) as $item | $item]' /tmp/test_jq.json

Repository: Super-Protocol/sp-vm

Length of output: 600

🏁 Script executed:

# Also verify how startswith behaves with prefix matching cat > /tmp/test_startswith.jq <<'EOF' # Test startswith prefix matching { "cluster_nodes": [ {"id": "redis-cluster-1-node-1"}, {"id": "redis-node-1"}, {"id": "redis-backup-1"}, {"id": "postgres-1"} ], "cluster_id": "redis" } EOF echo "=== Testing startswith behavior ===" jq '.cluster_nodes[] | select(.id | startswith("redis")) | .id' /tmp/test_startswith.jq

Repository: Super-Protocol/sp-vm

Length of output: 173

🏁 Script executed:

# Fix the jq test syntax cat > /tmp/test_startswith.jq <<'EOF' { "cluster_nodes": [ {"id": "redis-cluster-1-node-1"}, {"id": "redis-node-1"}, {"id": "redis-backup-1"}, {"id": "postgres-1"} ], "cluster_id": "redis" } EOF echo "=== Testing startswith behavior ===" jq '.cluster_nodes[] | select(.id | startswith("redis")) | .id' /tmp/test_startswith.jq echo "" echo "=== Example of prefix collision with startswith ===" jq '.cluster_nodes[] | select(.id | startswith("redis-")) | .id' /tmp/test_startswith.jq

Repository: Super-Protocol/sp-vm

Length of output: 266

🏁 Script executed:

# Let's create a more realistic test to understand the actual issue cat > /tmp/test_manifest_issue.jq <<'EOF' # Simulate what happens when multiple clusters match { "clusters": [ {"id": "redis-001", "cluster_policy": "redis", "deleted_ts": null}, {"id": "redis-002", "cluster_policy": "redis", "deleted_ts": null}, {"id": "wireguard-001", "cluster_policy": "wireguard", "deleted_ts": null} ], "appNodeIds": ["node1", "node2"] } EOF echo "=== Simulating multiple matching clusters ===" jq ' # Get all redis clusters (simulating the stream) .clusters[] | select(.cluster_policy == "redis" and .deleted_ts == null) as $redisCluster | # Show what we get assigned {assigned_cluster: $redisCluster} ' /tmp/test_manifest_issue.jq

Repository: Super-Protocol/sp-vm

Length of output: 183

🏁 Script executed:

# Test the actual behavior with multiple matching clusters echo '{"clusters": [{"id": "redis-001", "cluster_policy": "redis", "deleted_ts": null}, {"id": "redis-002", "cluster_policy": "redis", "deleted_ts": null}]}' | jq ' # Simulate: multiple redis clusters match, assigned to $redisCluster (.clusters[] | select(.cluster_policy == "redis" and .deleted_ts == null)) as $redisCluster | {assigned_cluster: $redisCluster} ' echo "" echo "=== Key insight: how jq handles the 'as' pattern with multiple matches ===" echo '{"items": ["a", "b", "c"]}' | jq ' # When multiple values match the stream before "as", what happens? (.items[]) as $item | {current: $item} '

Repository: Super-Protocol/sp-vm

Length of output: 420

🏁 Script executed:

# Now understand the actual problem: startswith prefix matching echo '{"cluster_id": "redis-001", "properties": [{"cluster_node": "redis-001-node-1"}, {"cluster_node": "redis-001-node-2"}, {"cluster_node": "redis-0011-node-1"}]}' | jq ' .cluster_id as $cid | .properties[] | select((.cluster_node | startswith($cid))) | .cluster_node ' echo "" echo "=== Showing the prefix collision issue ===" echo "If cluster_id is 'redis-001', startswith matches both 'redis-001-node-1' AND 'redis-0011-node-1'" echo "This is the real issue mentioned in the scratchpad!"

Repository: Super-Protocol/sp-vm

Length of output: 314

Address multiple matching clusters and prefix collision in cluster node filtering.

This expression can emit multiple complete state objects if multiple Redis or WireGuard clusters match the node-sharing criteria. Additionally, the startswith() checks on lines 67 and 81 for filtering cluster node properties are vulnerable to prefix collisions—for example, a cluster ID "redis-001" will match both "redis-001-node-1" (correct) and "redis-0011-node-1" (incorrect).

To fix both issues, make cluster selection deterministic and use exact matching with a delimiter:

🛠️ Suggested fix

# Find Redis cluster that shares at least one node with this app cluster ( $swarmdb.clusters[] | select(.cluster_policy == "redis" and .deleted_ts == null) | select( ( [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($appNodeIds[])))] | length > 0 ) ) - ) as $redisCluster | + ) | sort_by(.id) | .[0] + ) as $redisCluster |

# Find WireGuard cluster that contains any Redis nodes ( $swarmdb.clusters[] | select(.cluster_policy == "wireguard" and .deleted_ts == null) | select( ( [$swarmdb.clusternodes[] | select(.deleted_ts == null and (.node | IN($redisNodeIds[])))] | length > 0 ) ) - ) as $wgCluster | + ) | sort_by(.id) | .[0] + ) as $wgCluster |

redisNodeProperties: [ $swarmdb.clusternodeproperties[] | select( - (.cluster_node | startswith($redisCluster.id)) and + (.cluster_node | startswith($redisCluster.id + "-")) and .deleted_ts == null and .name == "redis_node_ready" ) |

wgNodeProperties: [ $swarmdb.clusternodeproperties[] | select( - (.cluster_node | startswith($wgCluster.id)) and + (.cluster_node | startswith($wgCluster.id + "-")) and .deleted_ts == null and .name == "tunnel_ip" ) |

🤖 Prompt for AI Agents

In `@src/services/apps/test-app-route/manifest.yaml` around lines 20 - 46, Select a single deterministic cluster instead of allowing multiple matches and make node-id prefix checks exact: when assigning $redisCluster and $wgCluster, constrain the cluster-selection expression to a single result (e.g., pipe the matching sequence into first or use limit(1; ...)) so only one cluster is bound, and replace fragile startswith(...) comparisons used later for matching cluster-node IDs with an exact delimiter-aware test (for example, use (.cluster | split("-")[0] == $redisCluster.id) or a regex anchored with the hyphen like test("^" + $redisCluster.id + "-")) to avoid prefix collisions; apply this change to the expressions that produce $redisCluster, $wgCluster and the node-property filters that previously used startswith().

coderabbitai · 2026-01-16T14:21:38Z

src/swarm-scripts/62.setup-knot.sh

+echo "Ensuring SwarmSecret 'base_domain' is present via swarm-cli..."
+DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" DB_PASSWORD="${DB_PASSWORD-}" \
+  python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value="test.oresty.superprotocol.io"


⚠️ Potential issue | 🟠 Major

Hardcoded test domain should be configurable.

The domain test.oresty.superprotocol.io is hardcoded. This appears to be a test/development value that will likely cause issues in production or other environments. Make it configurable via an environment variable.

Proposed fix

+BASE_DOMAIN=${BASE_DOMAIN:-test.oresty.superprotocol.io} + echo "Ensuring SwarmSecret 'base_domain' is present via swarm-cli..." DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" DB_PASSWORD="${DB_PASSWORD-}" \ - python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value="test.oresty.superprotocol.io" + python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value="$BASE_DOMAIN"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Ensuring SwarmSecret 'base_domain' is present via swarm-cli..."

DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" DB_PASSWORD="${DB_PASSWORD-}" \

python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value="test.oresty.superprotocol.io"

BASE_DOMAIN=${BASE_DOMAIN:-test.oresty.superprotocol.io}

echo "Ensuring SwarmSecret 'base_domain' is present via swarm-cli..."

DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" DB_PASSWORD="${DB_PASSWORD-}" \

python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value="$BASE_DOMAIN"

🤖 Prompt for AI Agents

In `@src/swarm-scripts/62.setup-knot.sh` around lines 57 - 59, Replace the hardcoded domain passed to the swarm-cli call by reading it from an environment variable (e.g. BASE_DOMAIN or SWARM_BASE_DOMAIN) and use that variable in the --value argument to SwarmSecrets base_domain; if you require a fail-fast behavior ensure the script validates the variable (and exits with a clear error) before invoking python3 "$(dirname "$0")/swarm-cli.py" create SwarmSecrets base_domain --value=..., otherwise provide a sensible safe default via ${BASE_DOMAIN:-your-default} so other environments can override it without editing the script.

coderabbitai · 2026-01-16T14:21:38Z

src/swarm-scripts/62.setup-knot.sh

+fi
+
+echo "Ensuring SwarmSecret 'base_domain' is present via swarm-cli..."
+DB_HOST="$DB_HOST" DB_PORT="$DB_PORT" DB_USER="$DB_USER" DB_NAME="$DB_NAME" DB_PASSWORD="${DB_PASSWORD-}" \


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n 'DB_PASSWORD' src/swarm-scripts/*.sh

Repository: Super-Protocol/sp-vm

Length of output: 209

🏁 Script executed:

ls -la src/swarm-scripts/*.sh | head -20

Repository: Super-Protocol/sp-vm

Length of output: 1324

🏁 Script executed:

# Check the context of DB_PASSWORD usage in script 62 head -n 65 src/swarm-scripts/62.setup-knot.sh | tail -n 15

Repository: Super-Protocol/sp-vm

Length of output: 1081

🏁 Script executed:

# Check if other scripts pass database credentials in different ways rg -n 'DB_' src/swarm-scripts/*.sh

Repository: Super-Protocol/sp-vm

Length of output: 11162

Script 62 inconsistently passes DB_PASSWORD to swarm-cli.py while other provisioning scripts do not.

On line 58, this script passes DB_PASSWORD="${DB_PASSWORD-}" to a swarm-cli.py call, but all other provisioning scripts (10, 20, 30, 40, 50, 60, 61, 63, 64, 65, 66, 67, 70) omit it entirely. Additionally, earlier swarm-cli.py calls within script 62 itself (lines 38-53) do not pass DB_PASSWORD. This inconsistency should be resolved for clarity and maintainability.

🤖 Prompt for AI Agents

In `@src/swarm-scripts/62.setup-knot.sh` at line 58, Script 62 is the only provisioning script that passes DB_PASSWORD to the swarm-cli.py invocation (DB_PASSWORD="${DB_PASSWORD-}") while other scripts and earlier calls in this same file omit it; remove the DB_PASSWORD="${DB_PASSWORD-}" token from the swarm-cli.py environment assignment in script 62 so the call matches the other provisioning scripts, and ensure you adjust surrounding whitespace/line continuation (backslashes) so the command remains syntactically correct.

coderabbitai · 2026-01-16T14:21:38Z

src/swarm-scripts/swarm-cli.py

+def create_engine_from_env(db: dict) -> Engine:
+  user = db["user"]
+  password = db.get("password") or ""
+  pw_part = f":{password}" if password else ""
+  host = db["host"]
+  port = db["port"]
+  name = db["name"]
+  dsn = f"mysql+pymysql://{user}{pw_part}@{host}:{port}/{name}?charset=utf8mb4"
+  return create_engine(dsn, pool_pre_ping=True, future=True)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if file exists and read the relevant lines if [ -f "src/swarm-scripts/swarm-cli.py" ]; then echo "File found. Reading lines 69-77:" sed -n '69,77p' src/swarm-scripts/swarm-cli.py else echo "File not found at specified path. Searching for swarm-cli.py:" find . -name "swarm-cli.py" -type f fi

Repository: Super-Protocol/sp-vm

Length of output: 450

🏁 Script executed:

#!/bin/bash # Also check for any other DSN-related code in the file echo "=== Checking for create_engine_from_env function ===" rg -n "create_engine_from_env" src/swarm-scripts/swarm-cli.py -A 15

Repository: Super-Protocol/sp-vm

Length of output: 1310

🏁 Script executed:

#!/bin/bash # Check imports to see what's currently imported from sqlalchemy echo "=== Checking SQLAlchemy imports in swarm-cli.py ===" rg -n "from sqlalchemy|import.*sqlalchemy" src/swarm-scripts/swarm-cli.py

Repository: Super-Protocol/sp-vm

Length of output: 201

🌐 Web query:

SQLAlchemy URL.create credentials special characters encoding

💡 Result:

Short answer — two options:

If you pass a full connection string to create_engine(), you must percent‑encode special characters in username/password (e.g. @ → %40, / → %2F). Use urllib.parse.quote or quote_plus to build the string. [1]

If you use sqlalchemy.engine.URL.create(...) and pass username/password as arguments, do NOT pre-encode them — pass the raw string (URL.create will handle it without needing manual percent-encoding). [1][6]

Examples

String URL (encode credentials):
from urllib.parse import quote_plus
pw = quote_plus("kx@jj5/g")
url = f"postgresql+pg8000://dbuser:{pw}@pghost10/appdb"
engine = create_engine(url) [1]

URL.create (pass raw credentials):
from sqlalchemy import URL, create_engine
url = URL.create("postgresql+pg8000", username="dbuser", password="kx@jj5/g", host="pghost10", database="appdb")
engine = create_engine(url) [1][6]

Notes

quote_plus replaces spaces with +; use quote(...) if you prefer %20 for spaces.

The SQLAlchemy docs explicitly state password should not be URL-encoded when passed to URL.create(). [1][6]

Sources
[1] SQLAlchemy Engine Configuration — Escaping special characters in passwords (official docs).
[6] SQLAlchemy docs — URL.create() password note.

🌐 Web query:

SQLAlchemy DSN password contains special characters like @ colon

💡 Result:

Short answer: percent-encode the password in the DSN (or avoid a single-string DSN by using SQLAlchemy's URL object / passing credentials as parameters).

Options:

URL-encode the password and use the string DSN:

Python: urllib.parse.quote_plus or urllib.parse.quote to encode special chars.

Example:
from urllib.parse import quote_plus
pw = quote_plus("p@ss:word/with#chars")
dsn = f"postgresql+psycopg2://user:{pw}@dbhost/dbname"
engine = create_engine(dsn)
(Special chars: @ → %40, : → %3A, / → %2F, etc.) [1][2]

Use SQLAlchemy's URL.create (no manual escaping required):
from sqlalchemy import URL, create_engine
url = URL.create("postgresql+psycopg2", username="user", password="p@ss:word/with#chars", host="dbhost", database="dbname")
engine = create_engine(url)
This bypasses string parsing and accepts raw passwords. [1]

For DB-API drivers (psycopg2/psycopg) you can also pass credentials as keyword args (connect parameters) instead of a URI to avoid encoding/escaping. [2][4]

References:

SQLAlchemy docs — escaping and URL.create. [1]

Examples / psycopg2 advice re: percent-encoding or passing params. [2][3]

If you want, tell me your DB+driver and I’ll give the exact one-line DSN or code snippet.

🏁 Script executed:

#!/bin/bash # Check SQLAlchemy version requirement in the project if [ -f "requirements.txt" ]; then rg "sqlalchemy|SQLAlchemy" requirements.txt fi if [ -f "setup.py" ]; then rg "sqlalchemy|SQLAlchemy" setup.py fi if [ -f "pyproject.toml" ]; then rg "sqlalchemy|SQLAlchemy" pyproject.toml fi if [ -f "setup.cfg" ]; then rg "sqlalchemy|SQLAlchemy" setup.cfg fi

Repository: Super-Protocol/sp-vm

Length of output: 46

🏁 Script executed:

#!/bin/bash # Verify the proposed fix syntax - check if URL.create exists in sqlalchemy.engine or sqlalchemy module python3 <<'EOF' try: from sqlalchemy import URL print("✓ URL.create() is available from sqlalchemy module") import inspect sig = inspect.signature(URL.create) print(f"✓ URL.create signature: {sig}") except ImportError as e: print(f"✗ Import error: {e}") except Exception as e: print(f"✗ Error: {e}") EOF

Repository: Super-Protocol/sp-vm

Length of output: 108

URL-encode credentials when building the SQLAlchemy DSN.

The f-string DSN breaks when username/password contains reserved URL characters (e.g., @, :). Use SQLAlchemy's URL.create() to safely construct the connection string.

🔧 Proposed fix

-from sqlalchemy import create_engine, text -from sqlalchemy.engine import Engine +from sqlalchemy import create_engine, text, URL +from sqlalchemy.engine import Engine @@ - dsn = f"mysql+pymysql://{user}{pw_part}@{host}:{port}/{name}?charset=utf8mb4" - return create_engine(dsn, pool_pre_ping=True, future=True) + url = URL.create( + drivername="mysql+pymysql", + username=user, + password=password or None, + host=host, + port=int(port), + database=name, + query={"charset": "utf8mb4"}, + ) + return create_engine(url, pool_pre_ping=True, future=True)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def create_engine_from_env(db: dict) -> Engine:

user = db["user"]

password = db.get("password") or ""

pw_part = f":{password}" if password else ""

host = db["host"]

port = db["port"]

name = db["name"]

dsn = f"mysql+pymysql://{user}{pw_part}@{host}:{port}/{name}?charset=utf8mb4"

return create_engine(dsn, pool_pre_ping=True, future=True)

def create_engine_from_env(db: dict) -> Engine:

user = db["user"]

password = db.get("password") or ""

pw_part = f":{password}" if password else ""

host = db["host"]

port = db["port"]

name = db["name"]

url = URL.create(

drivername="mysql+pymysql",

username=user,

password=password or None,

host=host,

port=int(port),

database=name,

query={"charset": "utf8mb4"},

)

return create_engine(url, pool_pre_ping=True, future=True)

🤖 Prompt for AI Agents

In `@src/swarm-scripts/swarm-cli.py` around lines 69 - 77, The DSN built in create_engine_from_env is vulnerable to breaking when user/password contain URL-reserved characters; replace the f-string construction with SQLAlchemy's URL.create to safely percent-encode credentials and other parts: import and call sqlalchemy.engine.URL.create(drivername="mysql+pymysql", username=user, password=password or None, host=host, port=port, database=name, query={"charset":"utf8mb4"}) and pass that URL object into create_engine(..., pool_pre_ping=True, future=True) instead of the manual dsn string.

* feat: Services downloader from gatekeeper * feat: Update services downloader installation script permissions and execution * fix: Fix test-gatekeeper app initialization * feat: Enhance services downloader CLI with unpack option * feat: Add script to register test-gatekeeper-service in SwarmDB * feat: Add Swarm deployment instructions for sp-vm for copilot * feat: download sp-swarm-services on startup * feat(services-downloader): Delete files after download * feat(services-downloader): Add service for downloading sp-swarm-services in Dockerfile * fix: remove padding from keys and certs * fix: extraction certs from yaml * fix: redirect log output to stderr in download script * feat(service-downloader): Support unpack with absolute path * feat(download-sp-swarm-service): added logs to /var/log * feat: deleted test-gatekeeper-service * refactor: change services launch order * fix: remove services ordering cycling * feat: keep old files on unpack-with-absolute-path * fix(service-downloader): comment fixes * fix(download-sp-swarm-services): add timestamp to log messages * feat(download-sp-swarm-services): add cloud mode detection and no-op behavior

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@src/Dockerfile`:
- Around line 74-86: Replace the GitHub auto-generated archive download in the
e2fsprogs_builder stage: stop using ADD
https://github.com/.../v${e2fsprogs_version}.tar.gz and instead fetch the
official upstream tarball URL for e2fsprogs using wget/curl in a RUN, add an ARG
e2fsprogs_sha256 (or hardcode the published checksum) and verify the downloaded
file with sha256sum (exit non‑zero on mismatch) before extracting to /tmp and
continuing the existing WORKDIR, configure, make, copy and strip steps; update
references to e2fsprogs_version and e2fsprogs_builder accordingly so the build
fails on checksum mismatch and no longer relies on GitHub’s /archive/refs/tags
tarball.

In
`@src/rootfs/files/configs/usr/local/lib/services-downloader/src/gatekeeper-client.js`:
- Around line 38-46: The error creation in gatekeeper-client.js currently embeds
the full response body into the Error message (the Error constructed with
`Gatekeeper request failed: ${res.statusCode} ${res.statusMessage} -
${body.toString('utf8')}`), which can leak secrets; change it so the Error
message only contains the statusCode and statusMessage and at most a short
sample/indicator (e.g., "response body omitted" or a truncated snippet) while
still attaching the full raw `body` on the error object (`error.body = body`)
and keeping `error.statusCode` and `error.headers` before calling
`reject(error)` so logs won’t contain the entire response payload.

In `@src/rootfs/files/configs/usr/local/lib/services-downloader/src/index.js`:
- Around line 69-118: In main(), the parsing for timeout and threads uses
Number() which can produce NaN or invalid/negative values; validate args.timeout
and args.threads after conversion (e.g., const timeout = args.timeout ?
Number(args.timeout) : 30000; const threads = args.threads ?
Number(args.threads) : undefined) to ensure they are finite integers and > 0 (or
for timeout > 0ms, for threads a positive integer), and throw clear errors like
'Invalid --timeout value' / 'Invalid --threads value' if validation fails before
calling getResourceFromGatekeeper or passing threads to the downloader; update
the validation checks near the existing timeout/threads declarations in main()
to perform Number.isFinite and integer/positivity checks.

In
`@src/rootfs/files/configs/usr/local/lib/services-downloader/src/unarchiver.js`:
- Around line 39-84: The tar extraction code (execTarExtract and
execTarExtractAbsolute, used from unpackTarGz after findTarLike) is vulnerable
to path-traversal and absolute-path extraction (-P to /) and must be hardened:
stop invoking system tar with -P and never extract to '/' with preserved
absolute paths; instead either use a safe JS tar library (e.g., npm "tar") that
supports a filter/transform hook, or pre-validate entries by listing the archive
and rejecting any member whose header.path is absolute or contains '..' or that
would resolve outside the intended unpackTarTo (and also reject symlinks
pointing outside). Replace execTarExtractAbsolute usage or remove it, ensure
execTarExtract only extracts into unpackTarTo with safe path checks, and surface
clear errors when a bad entry is found so extraction is aborted.

In `@src/rootfs/files/scripts/install_services_downloader.sh`:
- Around line 22-23: The smoke test currently masks failures by appending "||
true" to the chroot invocation that runs node
/usr/local/lib/services-downloader/src/index.js; remove the "|| true" so the
script exits non-zero on failure, or alternatively capture and log the error
before failing (e.g., run the chroot command, check its exit status, log a
descriptive message including the command and exit code, and exit non-zero).
Update the invocation that references OUTPUTDIR and the chroot/node call so test
failures will fail the build instead of being ignored.

🧹 Nitpick comments (1)

src/rootfs/files/configs/usr/local/bin/download-sp-swarm-services.sh (1)

88-147: Track the TODO and keep the PKCS#8‑only key check strict.

Please consider tracking the TODO (“temporary solution”) in an issue to avoid it getting lost. Also, keep the BEGIN PRIVATE KEY check strict so only PKCS#8 keys are accepted. Based on learnings, keep the PKCS#8‑only header check unchanged.

coderabbitai · 2026-01-22T14:26:25Z

src/Dockerfile

+### Start e2fsprogs ###
+FROM ubuntu:24.04 AS e2fsprogs_builder
+RUN apt-get update && apt-get install -y wget gcc make build-essential
+ARG e2fsprogs_version=1.47.3
+ADD https://github.com/tytso/e2fsprogs/archive/refs/tags/v${e2fsprogs_version}.tar.gz /tmp/e2fsprogs-${e2fsprogs_version}.tar.gz
+RUN tar -C /tmp -xvf/tmp/e2fsprogs-${e2fsprogs_version}.tar.gz
+WORKDIR /tmp/e2fsprogs-${e2fsprogs_version}
+RUN ./configure CFLAGS='-g -O2 -static' LDFLAGS='-static'
+RUN make -j
+RUN mkdir -p /initramfs-root/sbin
+RUN cp /tmp/e2fsprogs-${e2fsprogs_version}/misc/mke2fs /initramfs-root/sbin/mkfs.ext4
+RUN strip /initramfs-root/sbin/mkfs.ext4
+### End e2fsprogs ###


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

e2fsprogs v1.47.3 SHA256 checksum github release

💡 Result:

Here are the SHA-256 checksums for e2fsprogs v1.47.3:

e2fsprogs-1.47.3.tar.gz: 2f5164e64dd7d91eadd1e0e8a77d92c06dd7837bb19f1d9189ce1939b363d2b4. [1]

e2fsprogs-1.47.3.tar.xz: 857e6ef800feaa2bb4578fbc810214be5d3c88b072ea53c5384733a965737329. [2]

Sources: fossies (tar.gz). [1] Buildroot/kernel mirror reference for tar.xz SHA256. [2]

References:
[1] https://fossies.org/linux/misc/e2fsprogs-1.47.3.tar.gz/index_o.html
[2] Buildroot patch referencing kernel mirror (sha256 for e2fsprogs-1.47.3.tar.xz).

🌐 Web query:

e2fsprogs 1.47.3 tarball integrity verification official release

💡 Result:

Short answer — yes. Official e2fsprogs 1.47.3 tarball is distributed with a detached signature and sha256 sums; verify it by checking the SHA-256 and the PGP signature.

Files found on upstream mirrors:

e2fsprogs-1.47.3.tar.gz and e2fsprogs-1.47.3.tar.sign (or .tar.sign/.tar.gz.asc) and sha256sums.asc. [1][2][3]

Typical verification steps (run in the directory with the downloaded files):

Verify SHA‑256:

sha256sum -c sha256sums.asc
(or: sha256sum e2fsprogs-1.47.3.tar.gz and compare output to the published sha256)

Verify PGP signature (signature names may be .tar.sign or .tar.gz.asc):

gpg --verify e2fsprogs-1.47.3.tar.sign e2fsprogs-1.47.3.tar.gz

If gpg reports “No public key”, import the maintainer’s public key first (use a trusted source or WKD/Keyserver). Example using GnuPG’s locate-keys / WKD:

gpg --locate-keys [email protected] (or import the key ID shown by gpg from a trusted source)
Then repeat the gpg --verify command. For general guidance on using WKD / gpg to verify tarball signatures see kernel.org’s verification instructions. [4]

Sources:

upstream release files listing (ftp of Theodore Ts'o / mirrors) showing .tar.gz, .tar.sign and sha256sums.asc. [1][2][3]

gpg/WKD verification guidance. [4]

🌐 Web query:

github archive tarball v1.47.3 e2fsprogs reproducible SHA256

💡 Result:

Short answer: GitHub's auto-generated source tarballs are not guaranteed reproducible; use the upstream release tarball instead. The official e2fsprogs 1.47.3 upstream tarball (e2fsprogs-1.47.3.tar.gz) SHA‑256 is:
2f5164e64dd7d91eadd1e0e8a77d92c06dd7837bb19f1d9189ce1939b363d2b4. [1][2]

Sources:

SHA‑256 for e2fsprogs-1.47.3.tar.gz (Fossies / upstream copy). [1]

Official SourceForge release page for e2fsprogs v1.47.3. [2]

GitHub notes / community discussions explaining that GitHub-generated archives and their compression metadata/hashes can change and are not safe to assume reproducible. [3][4]

🌐 Web query:

github /archive/refs/tags download tarball checksum verification

💡 Result:

Short answer: GitHub does NOT publish a stable checksum for the auto-generated /archive/refs/tags/*.tar.gz files. You can compute one yourself (but it may change if GitHub regenerates the archive); for reproducible verification prefer signed release assets or an archive generated from a commit ID. [1][2]

How-to (commands)

Compute SHA‑256 of the downloaded tarball bytes:

curl -L -o repo-tag.tar.gz https://github.com/owner/repo/archive/refs/tags/vX.Y.Z.tar.gz

sha256sum repo-tag.tar.gz

Or compute on the fly:

curl -sL https://github.com/owner/repo/archive/refs/tags/vX.Y.Z.tar.gz | sha256sum

Caveats / recommendations

GitHub generates archives on demand and may re-generate them later; compression metadata can change even if extracted contents are identical, so a tarball’s checksum can change. If you need immutable verification use:

Release assets that include a published checksum or signature (upload checksum.txt or signed tarball), or

Fetch the archive for a specific commit ID (not a moving tag/branch) and/or use the Archives REST API with a commit SHA for better stability. [1][4]

Many projects generate their own checksum files (or use CI to publish checksum.txt) for release artifacts—do that if you control the project. [2][3]

Sources

GitHub Docs — Downloading source code archives (stability notes). [link].

Community discussion re: archive hash stability and recommendation to prefer release assets or commit-based archives. [link].

Example GitHub action / tools for creating checksum files for release assets. [link].

Use the official upstream tarball with checksum verification instead of GitHub's auto-generated archive.

The supply‑chain risk concern is valid, but the GitHub archive approach is problematic. GitHub's auto-generated /archive/refs/tags/ tarballs are not reproducible—the checksum can change if GitHub regenerates them, even when the extracted contents are identical. Instead, download the official upstream tarball with a published checksum:

✅ Recommended hardening

ARG e2fsprogs_version=1.47.3 -ADD https://github.com/tytso/e2fsprogs/archive/refs/tags/v${e2fsprogs_version}.tar.gz /tmp/e2fsprogs-${e2fsprogs_version}.tar.gz +ARG E2FSPROGS_SHA256=2f5164e64dd7d91eadd1e0e8a77d92c06dd7837bb19f1d9189ce1939b363d2b4 +RUN wget -O /tmp/e2fsprogs-${e2fsprogs_version}.tar.gz \ + "https://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/v${e2fsprogs_version}/e2fsprogs-${e2fsprogs_version}.tar.gz" \ + && echo "${E2FSPROGS_SHA256} /tmp/e2fsprogs-${e2fsprogs_version}.tar.gz" | sha256sum -c -

🤖 Prompt for AI Agents

In `@src/Dockerfile` around lines 74 - 86, Replace the GitHub auto-generated archive download in the e2fsprogs_builder stage: stop using ADD https://github.com/.../v${e2fsprogs_version}.tar.gz and instead fetch the official upstream tarball URL for e2fsprogs using wget/curl in a RUN, add an ARG e2fsprogs_sha256 (or hardcode the published checksum) and verify the downloaded file with sha256sum (exit non‑zero on mismatch) before extracting to /tmp and continuing the existing WORKDIR, configure, make, copy and strip steps; update references to e2fsprogs_version and e2fsprogs_builder accordingly so the build fails on checksum mismatch and no longer relies on GitHub’s /archive/refs/tags tarball.

coderabbitai · 2026-01-22T14:26:25Z

src/rootfs/files/configs/usr/local/lib/services-downloader/src/gatekeeper-client.js

+              const error = new Error(
+                `Gatekeeper request failed: ${res.statusCode} ${
+                  res.statusMessage
+                } - ${body.toString('utf8')}`,
+              );
+              error.statusCode = res.statusCode;
+              error.headers = res.headers;
+              error.body = body;
+              reject(error);


⚠️ Potential issue | 🟠 Major

Avoid embedding full Gatekeeper response bodies in error messages.

Non‑2xx errors include the full response body in the error message; if the body ever contains secrets (e.g., encryption payloads), it can leak into logs. Prefer a short sample (or omit entirely) while keeping the raw body on the error object.

🔒 Proposed fix

- const error = new Error( - `Gatekeeper request failed: ${res.statusCode} ${ - res.statusMessage - } - ${body.toString('utf8')}`, - ); + const bodyText = body.toString('utf8'); + const sample = bodyText.slice(0, 256); + const error = new Error( + `Gatekeeper request failed: ${res.statusCode} ${res.statusMessage}${ + sample ? ` - ${sample}` : '' + }`, + );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const error = new Error(

`Gatekeeper request failed: ${res.statusCode} ${

res.statusMessage

} - ${body.toString('utf8')}`,

);

error.statusCode = res.statusCode;

error.headers = res.headers;

error.body = body;

reject(error);

const bodyText = body.toString('utf8');

const sample = bodyText.slice(0, 256);

const error = new Error(

`Gatekeeper request failed: ${res.statusCode} ${res.statusMessage}${

sample ? ` - ${sample}` : ''

}`,

);

error.statusCode = res.statusCode;

error.headers = res.headers;

error.body = body;

reject(error);

🤖 Prompt for AI Agents

In `@src/rootfs/files/configs/usr/local/lib/services-downloader/src/gatekeeper-client.js` around lines 38 - 46, The error creation in gatekeeper-client.js currently embeds the full response body into the Error message (the Error constructed with `Gatekeeper request failed: ${res.statusCode} ${res.statusMessage} - ${body.toString('utf8')}`), which can leak secrets; change it so the Error message only contains the statusCode and statusMessage and at most a short sample/indicator (e.g., "response body omitted" or a truncated snippet) while still attaching the full raw `body` on the error object (`error.body = body`) and keeping `error.statusCode` and `error.headers` before calling `reject(error)` so logs won’t contain the entire response payload.

coderabbitai · 2026-01-22T14:26:25Z

src/rootfs/files/configs/usr/local/lib/services-downloader/src/index.js

+async function main() {
+  const args = parseArgs(process.argv.slice(2));
+  if (args.help) {
+    printHelp();
+    return;
+  }
+  try {
+    const resourceName = args['resource-name'];
+    const branchName = args['branch-name'];
+    const downloadTo = args['download-to'];
+    const unpackTo = args['unpack-to'];
+    const sslCertPath = args['ssl-cert-path'];
+    const sslKeyPath = args['ssl-key-path'];
+    const environment = args.environment || 'mainnet';
+    const timeout = args.timeout ? Number(args.timeout) : 30000;
+
+    const unpackWithAbs = !!args['unpack-with-absolute-path'];
+    if (!resourceName || !branchName || !sslCertPath || !sslKeyPath) {
+      throw new Error('Missing required arguments. See --help');
+    }
+    // Mode selection: exactly one of the mode flags must be provided
+    const modeCount = [!!downloadTo, !!unpackTo, unpackWithAbs].filter(Boolean).length;
+    if (modeCount !== 1) {
+      throw new Error(
+        'Specify exactly one mode: --download-to <dir> | --unpack-to <dir> | --unpack-with-absolute-path',
+      );
+    }
+    if (downloadTo && typeof downloadTo !== 'string') {
+      throw new Error('Invalid --download-to value');
+    }
+    if (unpackTo && typeof unpackTo !== 'string') {
+      throw new Error('Invalid --unpack-to value');
+    }
+
+    const [sslCertPem, sslKeyPem] = await Promise.all([
+      fs.readFile(sslCertPath, 'utf8'),
+      fs.readFile(sslKeyPath, 'utf8'),
+    ]);
+
+    console.info(`[INFO] fetching resource ${resourceName}@${branchName} env=${environment}`);
+    const { resource, encryption } = await getResourceFromGatekeeper({
+      resourceName,
+      branchName,
+      sslKeyPem,
+      sslCertPem,
+      environment,
+      timeout,
+    });
+
+    const threads = args.threads ? Number(args.threads) : undefined;


⚠️ Potential issue | 🟡 Minor

Validate numeric inputs for --timeout and --threads.

Number() can yield NaN/negative values, which may silently propagate to Gatekeeper/downloader. Treat invalid values as errors.

✅ Suggested validation

- const timeout = args.timeout ? Number(args.timeout) : 30000; + const timeout = args.timeout ? Number(args.timeout) : 30000; + if (!Number.isFinite(timeout) || timeout <= 0) { + throw new Error('Invalid --timeout value'); + } ... - const threads = args.threads ? Number(args.threads) : undefined; + const threads = args.threads ? Number(args.threads) : undefined; + if (threads !== undefined && (!Number.isInteger(threads) || threads <= 0)) { + throw new Error('Invalid --threads value'); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function main() {

const args = parseArgs(process.argv.slice(2));

if (args.help) {

printHelp();

return;

}

try {

const resourceName = args['resource-name'];

const branchName = args['branch-name'];

const downloadTo = args['download-to'];

const unpackTo = args['unpack-to'];

const sslCertPath = args['ssl-cert-path'];

const sslKeyPath = args['ssl-key-path'];

const environment = args.environment || 'mainnet';

const timeout = args.timeout ? Number(args.timeout) : 30000;

const unpackWithAbs = !!args['unpack-with-absolute-path'];

if (!resourceName || !branchName || !sslCertPath || !sslKeyPath) {

throw new Error('Missing required arguments. See --help');

}

// Mode selection: exactly one of the mode flags must be provided

const modeCount = [!!downloadTo, !!unpackTo, unpackWithAbs].filter(Boolean).length;

if (modeCount !== 1) {

throw new Error(

'Specify exactly one mode: --download-to <dir> | --unpack-to <dir> | --unpack-with-absolute-path',

);

}

if (downloadTo && typeof downloadTo !== 'string') {

throw new Error('Invalid --download-to value');

}

if (unpackTo && typeof unpackTo !== 'string') {

throw new Error('Invalid --unpack-to value');

}

const [sslCertPem, sslKeyPem] = await Promise.all([

fs.readFile(sslCertPath, 'utf8'),

fs.readFile(sslKeyPath, 'utf8'),

]);

console.info(`[INFO] fetching resource ${resourceName}@${branchName} env=${environment}`);

const { resource, encryption } = await getResourceFromGatekeeper({

resourceName,

branchName,

sslKeyPem,

sslCertPem,

environment,

timeout,

});

const threads = args.threads ? Number(args.threads) : undefined;

async function main() {

const args = parseArgs(process.argv.slice(2));

if (args.help) {

printHelp();

return;

}

try {

const resourceName = args['resource-name'];

const branchName = args['branch-name'];

const downloadTo = args['download-to'];

const unpackTo = args['unpack-to'];

const sslCertPath = args['ssl-cert-path'];

const sslKeyPath = args['ssl-key-path'];

const environment = args.environment || 'mainnet';

const timeout = args.timeout ? Number(args.timeout) : 30000;

if (!Number.isFinite(timeout) || timeout <= 0) {

throw new Error('Invalid --timeout value');

}

const unpackWithAbs = !!args['unpack-with-absolute-path'];

if (!resourceName || !branchName || !sslCertPath || !sslKeyPath) {

throw new Error('Missing required arguments. See --help');

}

// Mode selection: exactly one of the mode flags must be provided

const modeCount = [!!downloadTo, !!unpackTo, unpackWithAbs].filter(Boolean).length;

if (modeCount !== 1) {

throw new Error(

'Specify exactly one mode: --download-to <dir> | --unpack-to <dir> | --unpack-with-absolute-path',

);

}

if (downloadTo && typeof downloadTo !== 'string') {

throw new Error('Invalid --download-to value');

}

if (unpackTo && typeof unpackTo !== 'string') {

throw new Error('Invalid --unpack-to value');

}

const [sslCertPem, sslKeyPem] = await Promise.all([

fs.readFile(sslCertPath, 'utf8'),

fs.readFile(sslKeyPath, 'utf8'),

]);

console.info(`[INFO] fetching resource ${resourceName}@${branchName} env=${environment}`);

const { resource, encryption } = await getResourceFromGatekeeper({

resourceName,

branchName,

sslKeyPem,

sslCertPem,

environment,

timeout,

});

const threads = args.threads ? Number(args.threads) : undefined;

if (threads !== undefined && (!Number.isInteger(threads) || threads <= 0)) {

throw new Error('Invalid --threads value');

}

🤖 Prompt for AI Agents

In `@src/rootfs/files/configs/usr/local/lib/services-downloader/src/index.js` around lines 69 - 118, In main(), the parsing for timeout and threads uses Number() which can produce NaN or invalid/negative values; validate args.timeout and args.threads after conversion (e.g., const timeout = args.timeout ? Number(args.timeout) : 30000; const threads = args.threads ? Number(args.threads) : undefined) to ensure they are finite integers and > 0 (or for timeout > 0ms, for threads a positive integer), and throw clear errors like 'Invalid --timeout value' / 'Invalid --threads value' if validation fails before calling getResourceFromGatekeeper or passing threads to the downloader; update the validation checks near the existing timeout/threads declarations in main() to perform Number.isFinite and integer/positivity checks.

coderabbitai · 2026-01-22T14:26:25Z

src/rootfs/files/configs/usr/local/lib/services-downloader/src/unarchiver.js

+function execTarExtract(tarFile, destDir) {
+  return new Promise((resolve, reject) => {
+    const lower = tarFile.toLowerCase();
+    const args =
+      lower.endsWith('.tar.gz') || lower.endsWith('.tgz')
+        ? ['-xzf', tarFile, '-C', destDir]
+        : ['-xf', tarFile, '-C', destDir];
+    execFile('tar', args, (err) => {
+      if (err) return reject(err);
+      resolve();
+    });
+  });
+}
+
+function execTarExtractAbsolute(tarFile) {
+  return new Promise((resolve, reject) => {
+    const lower = tarFile.toLowerCase();
+    const args =
+      lower.endsWith('.tar.gz') || lower.endsWith('.tgz')
+        ? ['-xzf', tarFile, '-C', '/', '-p', '-P', '-k']
+        : ['-xf', tarFile, '-C', '/', '-p', '-P', '-k'];
+    execFile('tar', args, (err) => {
+      if (err) return reject(err);
+      resolve();
+    });
+  });
+}
+
+async function unpackTarGz(targetDir, unpackTarTo) {
+  await ensureDir(unpackTarTo);
+  const destHasFiles = await hasFiles(unpackTarTo);
+  if (destHasFiles) {
+    console.info(`[INFO] unpack skip: destination not empty: ${unpackTarTo}`);
+    return false;
+  }
+
+  const tarFile = await findTarLike(targetDir);
+  if (!tarFile) {
+    console.info(`[INFO] unpack skip: no archive found under ${targetDir}`);
+    return false;
+  }
+
+  await execTarExtract(tarFile, unpackTarTo);
+  console.info(`[INFO] unpacked ${path.basename(tarFile)} to ${unpackTarTo}`);
+  return true;
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n src/rootfs/files/configs/usr/local/lib/services-downloader/src/unarchiver.js | head -100

Repository: Super-Protocol/sp-vm

Length of output: 3534

🌐 Web query:

tar -C directory path traversal vulnerability ../

💡 Result:

Short answer: yes — extracting untrusted TARs with tar -C directory can be abused to write outside that directory using ../, absolute names, or symlink tricks; this is a longstanding class of "path traversal"/link-following vulnerabilities (examples: CVE-2002-0399, recent tar/tar‑fs advisories). [1][2][3]

Why it happens (brief):

TAR member names may contain "../" or leading slashes or GNUTYPE_* entries (symlinks) that, when processed in archive order, let later members end up outside the intended extraction directory. Attackers can craft archives that create a symlink inside the archive then place files that follow the symlink, causing writes outside -C. [2][3]

Concrete mitigations:

Don’t extract untrusted archives directly into important locations; extract only into a new empty directory you create first. [1]

Use GNU tar’s --one-top-level (or extract into an explicitly created directory with mkdir + tar -C that-dir) so members are guaranteed to be placed under a single subdirectory. [1][4]

Use --strip-components=N to remove path prefixes when appropriate (carefully). [1]

Keep tar/tar libraries up to date; apply vendor/security patches for known CVEs (e.g., tar/tar-fs advisories). [2][3]

Where possible, use sandboxing (unprivileged container, separate UID, or chroot), or validate/archive-list before extraction. For programmatic extraction, prefer libraries that explicitly sanitize paths or perform safe extraction logic. [2][3]

References:

GNU tar manual: --one-top-level and extraction options. [1]

tar-fs / npm advisory (path-traversal & link-following vulnerability, CVE-2024-12905). [2]

Recent GNU tar path-traversal coverage (CVE-2025-45582 / historical advisories). [3]

If you want, tell me your exact extraction command and tar/version and I’ll give the safest command/flags for your case.

Block path-traversal in archive extraction.

tar -x* -C <dir> does not prevent ../ entries or symlink tricks from escaping the destination. If a malicious archive is supplied, this can overwrite arbitrary files on the system. The execTarExtractAbsolute function is particularly dangerous as it uses -P (preserve absolute paths) when extracting to /, allowing complete filesystem compromise. Validate tar entries before extraction or switch to a library that enforces safe paths.

🛡️ Suggested safeguard (validate tar entries)

const { execFile } = require('child_process'); +function listTarEntries(tarFile) { + return new Promise((resolve, reject) => { + execFile('tar', ['-tf', tarFile], (err, stdout) => { + if (err) return reject(err); + resolve(stdout.split('\n').filter(Boolean)); + }); + }); +} + +function isSafeEntry(p) { + const normalized = path.normalize(p); + return !path.isAbsolute(normalized) && + normalized !== '..' && + !normalized.startsWith(`..${path.sep}`); +} + +async function assertSafeTarEntries(tarFile) { + const entries = await listTarEntries(tarFile); + const bad = entries.find((e) => !isSafeEntry(e)); + if (bad) throw new Error(`Unsafe path in archive: ${bad}`); +} + function execTarExtract(tarFile, destDir) { return new Promise((resolve, reject) => { const lower = tarFile.toLowerCase(); @@ async function unpackTarGz(targetDir, unpackTarTo) { @@ const tarFile = await findTarLike(targetDir); if (!tarFile) { @@ } + await assertSafeTarEntries(tarFile); await execTarExtract(tarFile, unpackTarTo);

🤖 Prompt for AI Agents

In `@src/rootfs/files/configs/usr/local/lib/services-downloader/src/unarchiver.js` around lines 39 - 84, The tar extraction code (execTarExtract and execTarExtractAbsolute, used from unpackTarGz after findTarLike) is vulnerable to path-traversal and absolute-path extraction (-P to /) and must be hardened: stop invoking system tar with -P and never extract to '/' with preserved absolute paths; instead either use a safe JS tar library (e.g., npm "tar") that supports a filter/transform hook, or pre-validate entries by listing the archive and rejecting any member whose header.path is absolute or contains '..' or that would resolve outside the intended unpackTarTo (and also reject symlinks pointing outside). Replace execTarExtractAbsolute usage or remove it, ensure execTarExtract only extracts into unpackTarTo with safe path checks, and surface clear errors when a bad entry is found so extraction is aborted.

coderabbitai · 2026-01-22T14:26:25Z

src/rootfs/files/scripts/install_services_downloader.sh

+    # quick smoke test prints help via node directly
+    chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null || true';


⚠️ Potential issue | 🟡 Minor

Smoke test silently ignores failures.

The || true suppresses all errors, defeating the purpose of a smoke test. If npm ci installed broken dependencies or the script has issues, this won't catch it. Consider failing the build on smoke test failure, or at minimum log the failure.

Suggested improvement

# quick smoke test prints help via node directly - chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null || true'; + if ! chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null'; then + log_warn "smoke test for services-downloader failed"; + fi

Or if the smoke test should be mandatory:

# quick smoke test prints help via node directly - chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null || true'; + log_info "running smoke test for services-downloader"; + chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null';

🤖 Prompt for AI Agents

In `@src/rootfs/files/scripts/install_services_downloader.sh` around lines 22 - 23, The smoke test currently masks failures by appending "|| true" to the chroot invocation that runs node /usr/local/lib/services-downloader/src/index.js; remove the "|| true" so the script exits non-zero on failure, or alternatively capture and log the error before failing (e.g., run the chroot command, check its exit status, log a descriptive message including the command and exit code, and exit non-zero). Update the invocation that references OUTPUTDIR and the chroot/node call so test failures will fail the build instead of being ignored.

drunkbatya and others added 30 commits July 24, 2025 09:53

Fix initramfs

cbe9971

Fix initramfs, it's booting!

d88a767

Add gve drivers, add working state disk mount

65583f0

Add more configs

90bab7b

Add rancher

d5dcca5

Working state and provider disk mount

3bf5311

it almost works!

9abb257

Fix iscsi

9ef879f

Add readme

03c9c35

Fix UEFI boot

4ce31a7

Move kata referencies to readme

f837b5d

Fix README

5d47e0c

Fixes and fixes

5361e1d

Fix grub branch

bfdcd8e

Add bios amd build

95984c0

Fix README

9004012

Add bios build, add ci-cd

54b8ff1

Fix readme

6f09961

remove sudo from ci

9f61777

Fixes

ae2f68b

And fixes

5e1ec06

Add tdx bios build

e4efd31

Add other artifacts to output

9d74026

Add upload other artifacts

41b3be8

Fix k8s.yaml rewriting

0d6b2f7

Remove debug

9ea0221

Fix SP_VM_IMAGE_VERSION

52e867f

support more than 1TB memory with SEV SNP

4a24b8f

draft

fcdb6ea

lxc support

e31d77d

zbitname and others added 7 commits November 27, 2025 22:00

Remove check-config-files.sh script to eliminate unused functionality…

067a4dc

… related to the check-config-files service, streamlining the project and reducing complexity.

Update Dockerfile: Comment out apt build-dep command to streamline th…

3a842d3

…e build process while retaining necessary patch import for SuperProtocol.

Update Dockerfile: Add symbolic link for getty@ttyS0.service to disab…

c34cd23

…le serial console access, enhancing security and streamlining the container environment.

Update Dockerfile and add NATS service override: Ensure NATS server s…

eb36d20

…tarts only after WireGuard tunnels and network are up by adding an override configuration file. This enhances service management and dependency handling within the container environment.

Refactor NATS service configuration: Remove the override configuratio…

ebed8bc

…n file for NATS server and update the installation script to ensure NATS starts only after WireGuard tunnels and network are up. This change simplifies service management within the Docker environment.

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

zbitname and others added 19 commits December 17, 2025 19:20

Merge branch 'swarm' into swarm-2

922e40a

Refactor systemd service configurations: Simplify dependencies for en…

405eb58

…sure-kubectl-link.service and swarm-db.service by removing unnecessary service requirements. This enhances service management and startup order within the system.

Enhance Redis setup: Introduce ensure_redis_directories function to c…

232665c

…reate and manage Redis data and log directories with appropriate ownership and permissions. This improves security and ensures proper directory structure for Redis operations.

Update Dockerfile to disable autostart for swarm-cloud-api, cockroach…

18f11fd

…db, and wireguard services during build. This enhances service management by preventing unnecessary systemd symlinks and streamlining the build process.

Update Dockerfile to set default target to multi-user.target, ensurin…

a42f3f4

…g system boots into multi-user mode instead of graphical mode. This change enhances system startup behavior and aligns with service management practices.

Merge branch 'main' into swarm

c285904

Merge branch 'swarm' into swarm-2

fa9ffdb

pull git submodules for repo

dc26ad2

use token for git submodules checkout

080d404

Merge pull request #111 from Super-Protocol/swarm-2

28f606e

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

		# quick smoke test prints help via node directly
		chroot "${OUTPUTDIR}" /bin/bash -c 'node /usr/local/lib/services-downloader/src/index.js --help >/dev/null \|\| true';

Swarm integration #93

Are you sure you want to change the base?

Swarm integration #93

Uh oh!

Conversation

vasilykraev commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vasilykraev commented Oct 15, 2025 •

edited by coderabbitai bot

Loading