Skip to content

Release: v0.2.0 - Alpha to Production#5

Open
jmeyer1980 wants to merge 45 commits intomainfrom
release
Open

Release: v0.2.0 - Alpha to Production#5
jmeyer1980 wants to merge 45 commits intomainfrom
release

Conversation

@jmeyer1980
Copy link
Collaborator

@jmeyer1980 jmeyer1980 commented Oct 29, 2025

User description

This PR brings 0.2.0 out of alpha status for production release.

What's Included

  • All test infrastructure and CI/CD workflows
  • Complete documentation and wiki
  • 1B tier (Fast) models ready for local testing
  • 4B-15B tier (Edit/QA/Plan) models configured for CI/CD

Testing

  • Automated workflows will run on this PR
  • Manual release workflows to be triggered after merge for full test coverage across all tiers
  • All tests passing before merge required

Release Process

  1. ✅ Create PR (this)
  2. ✅ Run automated tests via CI workflow
  3. ⏳ Merge PR
  4. ⏳ Push v0.2.0 tag (triggers release workflow)
  5. ⏳ Create GitHub Release with changelog

PR Type

Enhancement, Tests, Documentation


Description

  • Production-ready release v0.2.0 with comprehensive end-to-end testing infrastructure and complete documentation

  • E2E testing framework using Playwright (TypeScript) with 8 test scenarios for 1B tier models covering download, bootstrap, launch, and chat workflows

  • Cross-platform test runners for Linux/macOS (Bash) and Windows (PowerShell) with automated model launching and HTML report generation

  • GPU optimization and fallback server in bootstrap script with separate VRAM handling (6GB/8GB/12GB+) and CPU-only fallback for low-memory systems

  • Setup verification scripts (Bash and PowerShell) to validate post-bootstrap installation with 7 comprehensive checks and troubleshooting guidance

  • Open-access models only replacing proprietary alternatives (Qwen, TinyLlama, SmolLM2, DialoGPT, Gemma, GPT-Neo, GPT-J)

  • Complete documentation suite including installation guide, CLI usage, model configuration, troubleshooting, FAQ, and testing infrastructure documentation

  • Comprehensive wiki with 8 core documents covering getting started, installation, model configuration, CLI usage, testing, troubleshooting, and FAQ

  • CI/CD testing guide with GPU runner configuration options and multi-tier test coverage

  • Updated README with simplified messaging focused on CLI chat functionality and clear "what works" vs "not yet implemented" sections

  • Fallback OpenAI-compatible server for CPU environments with health checks, model listing, and basic chat completions


Diagram Walkthrough

flowchart LR
  A["Bootstrap Script<br/>GPU Optimization<br/>Fallback Server"] --> B["Setup Verification<br/>7 Validation Checks"]
  B --> C["E2E Tests<br/>Playwright<br/>8 Scenarios"]
  C --> D["Test Runners<br/>Bash/PowerShell<br/>HTML Reports"]
  E["Open-Access<br/>Models Only"] --> A
  F["Complete<br/>Documentation<br/>Wiki + Guides"] --> G["Production<br/>Release v0.2.0"]
  D --> G
  B --> G
Loading

File Walkthrough

Relevant files
Tests
7 files
cli-chat-1b.spec.ts
Add 1B tier CLI chat validation test suite                             

tests/e2e/cli-chat-1b.spec.ts

  • New comprehensive end-to-end test suite for 1B tier models using
    Playwright
  • Tests validate core customer experience: download → bootstrap → launch
    → chat workflow
  • Includes 8 test scenarios covering health checks, model listing,
    greeting, math, code generation, multi-turn conversations, concurrent
    requests, and JSON response validation
  • Helper functions for model launching, port readiness checking, and
    chat completion via curl
+331/-0 
run-1b-tests-local.sh
Add Linux/macOS test runner script for 1B tier                     

tests/run-1b-tests-local.sh

  • New bash script for running 1B tier tests locally on Linux/macOS
  • Provides prerequisite checking (Node.js, npm, Python), dependency
    installation, model launching, and test execution
  • Supports --no-model flag to skip model launch and --cleanup flag to
    stop model after tests
  • Generates HTML and JSON test reports with automatic browser opening
  • Includes colored output and detailed progress indicators
+320/-0 
run-1b-tests-local.ps1
Add Windows PowerShell test runner script for 1B tier       

tests/run-1b-tests-local.ps1

  • New PowerShell script for running 1B tier tests on Windows via WSL
  • Mirrors bash script functionality with Windows-specific
    implementations (WSL integration, PowerShell cmdlets)
  • Supports -NoModel and -Cleanup parameters for flexible test execution
  • Includes prerequisite checking, model launching via WSL, and HTML
    report generation
  • Provides colored output and automatic browser opening on Windows
+336/-0 
E2E-TESTING-COMPLETE.md
E2E Testing Infrastructure Documentation and Status           

E2E-TESTING-COMPLETE.md

  • Documents complete End-to-End testing infrastructure with Playwright
    (TypeScript) framework
  • Covers local testing for 1B tier on consumer hardware and CI/CD
    workflows for all tiers
  • Includes quick start commands, npm scripts, test coverage details, and
    troubleshooting guide
  • Provides configuration details, test structure examples, and success
    criteria checklist
+454/-0 
TESTING-IMPLEMENTATION-SUMMARY.md
Testing infrastructure implementation summary and handoff

docs-archive/2025-10-29_114041/root-docs/TESTING-IMPLEMENTATION-SUMMARY.md

  • Handoff document for E2E testing infrastructure using Playwright
    (TypeScript v1.44.0+)
  • Details six test suites covering CLI chat, API validation,
    configuration, IDE integration, user journey, and Rider integration
  • Documents local test runners for Windows (PowerShell) and Linux/macOS
    (Bash) with npm scripts
  • Includes CI/CD GitHub Actions workflows, framework configuration, and
    next steps for users
+418/-0 
TESTING-IMPLEMENTATION-SUMMARY.md
Testing infrastructure implementation summary and handoff

TESTING-IMPLEMENTATION-SUMMARY.md

  • Duplicate of testing implementation summary providing complete E2E
    testing infrastructure documentation
  • Covers Playwright framework setup, test coverage by tier (1B local,
    all tiers CI/CD), and execution flow
  • Lists all created/modified test files, npm scripts, and verification
    checklist
  • Provides quick reference for setup, running tests, viewing results,
    and troubleshooting
+418/-0 
TEST-INFRASTRUCTURE-STATUS.md
Test infrastructure status and implementation details       

docs-archive/2025-10-29_114041/root-docs/TEST-INFRASTRUCTURE-STATUS.md

  • Detailed status report on E2E testing infrastructure implementation
    with Playwright framework
  • Documents test files created, npm scripts available, and quick start
    instructions for local/CI testing
  • Provides test coverage breakdown by tier (1B local ready, all tiers
    CI/CD ready, extended platforms template-ready)
  • Includes configuration details, GitHub Actions workflows, report
    generation, and technical decisions
+374/-0 
Configuration changes
3 files
playwright.config.ts
Disable automatic webServer startup in Playwright config 

playwright.config.ts

  • Commented out webServer configuration block that was attempting to
    start vLLM models automatically
  • Models are now expected to be started manually before running tests
  • Maintains test timeout of 120000ms (2 minutes per test)
+6/-6     
chat-templates.conf
Update to open-access models only with new templates         

chat-templates.conf

  • Updated all model references to use open-access alternatives only
  • Removed proprietary/gated models (Llama-3.2-1B, Phi-3.5-mini,
    Gemma-3-4b, WizardLM-2, Codestral, Apriel)
  • Added open-access models: TinyLlama-1.1B, DialoGPT-medium/large,
    GPT-Neo-2.7B, GPT-J-6B
  • Updated template mappings to match new model selections
  • Added clarifying comments indicating "Open-Access Models" for each
    tier
+11/-11 
ports.conf
Update doctrine version number                                                     

ports.conf

  • Updated doctrine-version from 2025.10.10 to 2025.10.12
+1/-1     
Enhancement
4 files
initial-bootstrap.sh
Add GPU optimization, fallback server, and open-access models

scripts/initial-bootstrap.sh

  • Enhanced system dependency checking with command_exists() function to
    detect missing packages before installation
  • Improved GPU memory detection with separate handling for 6GB, 8GB, and
    12GB+ VRAM cards with optimized utilization percentages
  • Added fallback server system (fallback-openai-server.py) for CPU-only
    or low-memory systems
  • Implemented vLLM readiness probing with health endpoint checks and
    graceful fallback to lightweight server on startup failure
  • Updated model configurations to use open-access models only (Qwen,
    TinyLlama, SmolLM2, DialoGPT, Gemma, etc.)
  • Revised chat template mappings for all 12 models with open-access
    alternatives
+231/-35
verify-setup.sh
Add setup verification script for post-bootstrap validation

scripts/verify-setup.sh

  • New bash script to validate that initial-bootstrap.sh completed
    successfully
  • Performs 7 comprehensive checks: Python environment, config files,
    helper scripts, HuggingFace auth, GPU/CUDA support, PyTorch, and vLLM
    installation
  • Provides detailed output with color-coded results and actionable
    troubleshooting guidance
  • Supports --verbose flag for detailed per-check information
+234/-0 
verify-setup.ps1
Add Windows PowerShell setup verification script                 

scripts/verify-setup.ps1

  • New PowerShell script for Windows users to verify setup completion in
    WSL
  • Performs 7 checks similar to bash version: Python environment, config
    files, repository structure, WSL installation, documentation, and test
    infrastructure
  • Provides Windows-specific guidance and WSL command integration
  • Supports -Verbose switch for detailed output
+225/-0 
fallback-openai-server.py
Fallback OpenAI Server Implementation for CPU Environments

fallback-openai-server.py

  • Added complete Python HTTP server implementation for OpenAI-compatible
    API fallback
  • Implements health check endpoint (/health), models listing
    (/v1/models), and chat completions (/v1/chat/completions)
  • Includes basic response generation with pattern matching for common
    queries (greetings, math, code)
  • Provides lightweight fallback server for CPU-only environments without
    vLLM dependency
[link]   
Formatting
1 files
run-comprehensive-tests.ps1
Minor formatting update to test report template                   

tests/run-comprehensive-tests.ps1

  • Minor formatting changes to environment information section in test
    report generation
  • Changed markdown-style headers to plain text format (removed ** and -
    prefixes)
+6/-6     
Documentation
13 files
CHANGELOG.md
Add v0.2.0-alpha release notes with validation results     

CHANGELOG.md

  • Added new v0.2.0-alpha release section documenting production-ready
    status with real-world validation by Sweep AI
  • Documented successful 1B model deployment and chat functionality
    testing
  • Updated version numbering and release criteria met
  • Preserved all previous changelog entries for v0.1.0-alpha and
    development history
+463/-395
README.md
Restructure README for clarity and CLI-focused messaging 

README.md

  • Completely restructured README with simplified messaging focused on
    CLI chat functionality
  • Removed alpha status warnings and replaced with clear "what works" vs
    "not yet implemented" sections
  • Updated quick start guide to focus on CLI curl-based chat instead of
    IDE integration
  • Simplified model tier table and added comparison with alternatives
    (Ollama, LM Studio)
  • Updated version to 0.2.0-alpha and added wiki documentation links
  • Removed extensive troubleshooting and detailed configuration sections
    in favor of wiki links
+245/-437
complete-setup.md
Complete Setup Guide for Local LLM Deployment                       

docs-archive/2025-10-29_114041/docs-directory/guides/complete-setup.md

  • Comprehensive 700+ line setup guide titled "Local LLM Ritual —
    Four-Scroll Doctrine"
  • Covers complete installation workflow from WSL setup through Rider IDE
    integration
  • Includes detailed troubleshooting section, advanced configuration for
    multiple models, and API usage examples
  • Provides step-by-step instructions for HuggingFace authentication,
    model launching, and tmux background execution
+707/-0 
CHANGELOG.md
Release Changelog with v0.2.0-alpha Production Validation

docs-archive/2025-10-29_114041/root-docs/CHANGELOG.md

  • Comprehensive changelog documenting v0.2.0-alpha release with
    production-ready status
  • Details real-world validation by Sweep AI, including 1B model
    deployment and chat completion testing
  • Documents all features, changes, and known limitations across multiple
    release versions
  • Includes migration guide, roadmap, and testing status for community
    feedback
+463/-0 
E2E-TESTING-COMPLETE.md
Archived E2E Testing Infrastructure Documentation               

docs-archive/2025-10-29_114041/root-docs/E2E-TESTING-COMPLETE.md

  • Duplicate of main E2E testing documentation archived in docs-archive
    directory
  • Contains identical content covering Playwright testing framework,
    local and CI/CD testing setup
  • Includes test coverage by tier, configuration details, and
    troubleshooting resources
  • Maintains consistency with root-level testing documentation
+454/-0 
Installation-Guide.md
Comprehensive Installation Guide for vLLM-Bootstrap           

wiki/Installation-Guide.md

  • Detailed 600+ line installation guide covering WSL setup, HuggingFace
    authentication, and vLLM-Bootstrap installation
  • Provides step-by-step instructions for hardware/software prerequisites
    and post-installation verification
  • Includes comprehensive troubleshooting section for common installation
    issues
  • Documents directory structure, updating procedures, and uninstallation
    steps
+624/-0 
CI-TESTING-GUIDE.md
CI/CD Testing Guide with GPU Runner Configuration               

.github/CI-TESTING-GUIDE.md

  • Comprehensive CI/CD testing guide covering local testing on consumer
    hardware and GitHub Actions workflows
  • Details GPU runner configuration options (self-hosted, Lambda Labs,
    Paperspace) and test structure
  • Includes environment variables, performance expectations, and multi-OS
    support roadmap
  • Provides troubleshooting section and guidelines for contributing new
    tests
+335/-0 
README.md
Complete project README with setup and usage documentation

docs-archive/2025-10-29_114041/root-docs/README.md

  • Comprehensive project README with feature overview, quick start guide,
    and architecture documentation
  • Includes system requirements, installation methods, usage instructions
    for four model tiers (fast/edit/qa/plan)
  • Documents Rider IDE integration setup and troubleshooting guide with
    common issues and solutions
  • Provides acknowledgments, support channels, and contribution
    guidelines for the vLLM-Doctrine project
+437/-0 
Troubleshooting.md
Detailed troubleshooting guide for common issues                 

wiki/Troubleshooting.md

  • Comprehensive troubleshooting guide covering model loading failures,
    connection issues, and CUDA problems
  • Includes quick diagnosis section with health checks and environment
    validation commands
  • Documents VRAM management, WSL networking issues, test failures, and
    performance optimization strategies
  • Provides step-by-step solutions for common errors with expected
    outputs and verification methods
+623/-0 
Model-Configuration.md
Model tier configuration and management guide                       

wiki/Model-Configuration.md

  • Documents four model tiers (1B/4B/7B/15B) with specifications, VRAM
    requirements, and port assignments
  • Explains how to launch different tiers, switch models, and manage VRAM
    for single/multiple model scenarios
  • Covers model storage, chat templates, preloading, and performance
    tuning recommendations
  • Includes configuration file formats for models.conf, ports.conf, and
    chat-templates.conf
+495/-0 
CLI-Usage.md
CLI usage guide for chatting with local models                     

wiki/CLI-Usage.md

  • Guide for launching models by tier and chatting via curl commands with
    OpenAI-compatible API
  • Demonstrates simple chat, multi-turn conversations, response length
    control, and temperature adjustment
  • Covers health checks, model listing, managing multiple models, and
    background running with tmux
  • Includes advanced patterns for code generation, explanation,
    debugging, and batch processing
+457/-0 
FAQ.md
Frequently asked questions and quick answers                         

wiki/FAQ.md

  • Comprehensive FAQ covering general questions, hardware requirements,
    installation, models, and usage
  • Addresses performance questions, technical details about OpenAI API
    compatibility and chat templates
  • Includes troubleshooting references, IDE integration status, licensing
    information, and contribution guidelines
  • Compares vLLM-Bootstrap with alternatives (Ollama, LM Studio,
    text-generation-webui)
+446/-0 
DOCUMENTATION-PLAN.md
Documentation strategy and implementation roadmap               

DOCUMENTATION-PLAN.md

  • Strategic plan for customer-facing documentation aligned with
    Scrollkeeper Doctrine principles
  • Outlines GitHub Wiki structure with eight core documents (Home,
    Getting Started, Installation, Model Configuration, CLI Usage,
    Testing, Troubleshooting, FAQ)
  • Specifies content guidelines, implementation phases, success criteria,
    and maintenance procedures
  • Clarifies documentation scope: CLI chat only (no IDE integration yet),
    with emphasis on verified features only
+266/-0 
Additional files
24 files
test-all-tiers.yml +241/-0 
test-linux-practical.yml +186/-0 
DOCUMENTATION-ACCURACY-FIXES.md +296/-0 
DOCUMENTATION-IMPLEMENTATION-COMPLETE.md +439/-0 
DOCUMENTATION-INDEX.md +269/-0 
MILESTONES.md +161/-154
ROADMAP.md +90/-37 
TEST-INFRASTRUCTURE-STATUS.md +374/-0 
ARCHIVE-INDEX.md +50/-0   
README.md +35/-0   
known-issues.md +295/-0 
testing.md +279/-0 
CONTRIBUTING.md +310/-0 
MILESTONES.md +162/-0 
QUICK-START-TESTING.md +361/-0 
ROADMAP.md +266/-0 
package.json +12/-10 
VERIFY-SETUP.md +240/-0 
index.html +85/-0   
results.json +81/-0   
Getting-Started.md +389/-0 
Home.md +140/-0 
README.md +177/-0 
Testing-Guide.md +474/-0 

- Sweep AI successfully tested 1B model deployment
- All exit criteria met and documented
- Standardized version numbering to 0.2.0-alpha
- Fixed package.json UTF-8 encoding
- Updated status badges to reflect alpha-validated status
v0.2.0-alpha successfully validated CLI chat functionality through terminal-based testing.
# Conflicts:
#	scripts/verify-setup.ps1
#	scripts/verify-setup.sh
@jmeyer1980 jmeyer1980 self-assigned this Oct 29, 2025
@jmeyer1980 jmeyer1980 added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 29, 2025
@qodo-code-review
Copy link

qodo-code-review bot commented Oct 29, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Privileged package install

Description: The script auto-installs missing system packages via 'sudo apt install' using a
dynamically constructed package list from environment checks, which could be risky if EXEC
context is untrusted and may lead to privilege escalation or unintended package
installation.
initial-bootstrap.sh [52-61]

Referred Code
if [ -n "$MISSING_DEPS" ]; then
  echo "⚠️  Missing dependencies:$MISSING_DEPS"
  echo "🔄 Installing missing system dependencies..."
  echo "📝 This may require sudo privileges. Please enter your password if prompted."
  sudo apt update
  sudo apt install -y$MISSING_DEPS || {
    echo "❌ Failed to install dependencies. Please install manually:"
    echo "   sudo apt update && sudo apt install -y$MISSING_DEPS"
    exit 1
  }
Command injection risk

Description: The test constructs a curl command with JSON payload embedded in single quotes without
escaping, which can break or allow command injection if prompt content ever becomes
dynamic or contains quotes; ensure strict escaping or use HTTP client library.
cli-chat-1b.spec.ts [33-48]

Referred Code
  const command = `curl -s -X POST http://localhost:${FAST_TIER_PORT}/v1/chat/completions \\
    -H "Content-Type: application/json" \\
    -d '${JSON.stringify(payload)}'`;

  const { stdout, stderr } = await execAsync(command, { timeout: CHAT_TIMEOUT });

  if (stderr) {
    console.warn(`Chat stderr: ${stderr}`);
  }

  const response = JSON.parse(stdout);
  return response.choices?.[0]?.message?.content || '';
} catch (error) {
  const msg = error instanceof Error ? error.message : String(error);
  throw new Error(`Chat failed: ${msg}`);
}
Unauthenticated HTTP server

Description: The fallback HTTP server is launched via nohup without authentication and listens on
0.0.0.0, exposing an OpenAI-compatible endpoint publicly if firewall allows; this can leak
access and enable misuse.
initial-bootstrap.sh [269-316]

Referred Code
if [ "$FORCE_FALLBACK" -eq 1 ]; then
  echo "⚠️  Low-memory CPU-only system detected. Starting lightweight fallback server instead of vLLM."
  nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" >> "$LOG_FILE" 2>&1 &
  FALLBACK_PID=$!
  echo "ℹ️  Fallback server PID: $FALLBACK_PID"
  wait $FALLBACK_PID
  exit $?
fi

# Try to launch vLLM first in the background and probe readiness
nohup python3 -m vllm.entrypoints.openai.api_server \
  --model "$MODEL" \
  --port "$PORT" \
  --gpu-memory-utilization "$UTIL" \
  $CHAT_TEMPLATE \
  >> "$LOG_FILE" 2>&1 &
VLLM_PID=$!

echo "⏳ Waiting for vLLM to become ready (up to 180s)..."
READY=0
for i in $(seq 1 180); do


 ... (clipped 27 lines)
Open server exposure

Description: The embedded fallback-openai-server.py responds on all interfaces (0.0.0.0) without auth
and echoes parts of user input, which may expose the service on the network and enable
SSRF-like misuse; restrict binding or add auth.
initial-bootstrap.sh [438-527]

Referred Code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import time
import argparse
from http.server import BaseHTTPRequestHandler, HTTPServer

def now():
    return int(time.time())

class Handler(BaseHTTPRequestHandler):
    server_version = "FallbackOpenAI/0.1"

    def _set_headers(self, status=200, content_type="application/json"):
        self.send_response(status)
        self.send_header("Content-Type", content_type)
        self.end_headers()

    def do_GET(self):
        if self.path == "/health":


 ... (clipped 69 lines)
No API auth

Description: The script launches vLLM with a hardcoded model and logs to a predictable path, then tails
logs on failure; while convenient, there is no auth and service binds to localhost but
combined with other tools could be exposed—consider explicit bind and auth flags.
run-1b-tests-local.sh [172-201]

Referred Code
print_info "Launching $MODEL_ROLE model on port $PORT..."
cd "$HOME/.config/llm-doctrine"

source ~/torch-env/bin/activate
nohup python3 -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --port $PORT \
  --gpu-memory-utilization 0.7 \
  > logs/fast_${PORT}.log 2>&1 &

MODEL_PID=$!
print_info "Model process started (PID: $MODEL_PID)"

# Wait for model to be ready
print_info "Waiting up to ${TIMEOUT}s for model to become ready..."
ELAPSED=0
while [ $ELAPSED -lt $TIMEOUT ]; do
  if check_port_available; then
    print_success "Model is ready on port $PORT!"
    return 0
  fi


 ... (clipped 9 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New bootstrap and fallback launch logic performs critical actions (dependency installs,
model start/stop, fallback server) without writing structured audit logs with user,
timestamp, action, and outcome.

Referred Code
if [ -n "$MISSING_DEPS" ]; then
  echo "⚠️  Missing dependencies:$MISSING_DEPS"
  echo "🔄 Installing missing system dependencies..."
  echo "📝 This may require sudo privileges. Please enter your password if prompted."
  sudo apt update
  sudo apt install -y$MISSING_DEPS || {
    echo "❌ Failed to install dependencies. Please install manually:"
    echo "   sudo apt update && sudo apt install -y$MISSING_DEPS"
    exit 1
  }
else
  echo "✅ All system dependencies found"
fi

# --- Virtual environment ---
if [ ! -d ~/torch-env ]; then
  echo "🐍 Creating Python virtual environment..."
  python3 -m venv ~/torch-env
fi
source ~/torch-env/bin/activate
pip install --upgrade pip


 ... (clipped 244 lines)
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Weak error context: Network/process exec failures are surfaced with generic messages and limited context
(e.g., curl stdout parsing) which may hinder debugging of edge cases.

Referred Code
  const { stdout, stderr } = await execAsync(command, { timeout: CHAT_TIMEOUT });

  if (stderr) {
    console.warn(`Chat stderr: ${stderr}`);
  }

  const response = JSON.parse(stdout);
  return response.choices?.[0]?.message?.content || '';
} catch (error) {
  const msg = error instanceof Error ? error.message : String(error);
  throw new Error(`Chat failed: ${msg}`);
}
Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured logs: Long-running servers redirect to plain text log files without structured format or
safeguards, risking inclusion of sensitive content from model outputs or errors.

Referred Code
  >> "$LOG_FILE" 2>&1 &
VLLM_PID=$!

echo "⏳ Waiting for vLLM to become ready (up to 180s)..."
READY=0
for i in $(seq 1 180); do
  if curl -s "http://localhost:$PORT/health" > /dev/null 2>&1; then
    READY=1
    break
  fi
  if ! kill -0 "$VLLM_PID" 2>/dev/null; then
    # vLLM process exited early
    break
  fi
  sleep 1
done

if [ "$READY" -eq 1 ] && kill -0 "$VLLM_PID" 2>/dev/null; then
  echo "✅ vLLM is ready on port $PORT (PID: $VLLM_PID). Attaching..."
  wait "$VLLM_PID"
  exit $?


 ... (clipped 11 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Fallback server risks: The new fallback-openai-server is started without authentication or rate limits on
localhost and may echo user input to logs; input validation and exposure controls are
unclear from the diff.

Referred Code
if [ "$FORCE_FALLBACK" -eq 1 ]; then
  echo "⚠️  Low-memory CPU-only system detected. Starting lightweight fallback server instead of vLLM."
  nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" >> "$LOG_FILE" 2>&1 &
  FALLBACK_PID=$!
  echo "ℹ️  Fallback server PID: $FALLBACK_PID"
  wait $FALLBACK_PID
  exit $?
fi

# Try to launch vLLM first in the background and probe readiness
nohup python3 -m vllm.entrypoints.openai.api_server \
  --model "$MODEL" \
  --port "$PORT" \
  --gpu-memory-utilization "$UTIL" \
  $CHAT_TEMPLATE \
  >> "$LOG_FILE" 2>&1 &
VLLM_PID=$!

echo "⏳ Waiting for vLLM to become ready (up to 180s)..."
READY=0
for i in $(seq 1 180); do


 ... (clipped 242 lines)
  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@jmeyer1980 jmeyer1980 modified the milestones: v0.3.0-beta, v0.2.0-alpha Oct 29, 2025
@qodo-code-review
Copy link

qodo-code-review bot commented Oct 29, 2025

PR Code Suggestions ✨

Latest suggestions up to 1981f34

CategorySuggestion                                                                                                                                    Impact
Security
Enforce safe allowlisted package installs
Suggestion Impact:The commit implements collecting validated packages into a VALIDATED_DEPS array, checks its length, logs accordingly, and installs using apt-get with "${VALIDATED_DEPS[@]}", preventing word-splitting/injection. It also updates related logging and error handling.

code diff:

-  # Validate each package against allowlist
+  # Collect validated packages into array to prevent word-splitting/injection
+  declare -a VALIDATED_DEPS=()
   for pkg in $MISSING_DEPS; do
-    if ! validate_package "$pkg"; then
+    if validate_package "$pkg"; then
+      VALIDATED_DEPS+=("$pkg")
+    else
       echo "❌ SECURITY ERROR: Package '$pkg' not in approved installation list"
       audit_log "PACKAGE_INSTALL" "BLOCKED" "Attempted to install non-allowlisted package: $pkg"
       exit 1
     fi
   done
   
-  echo "🔄 Installing system dependencies (validated)..."
-  echo "📝 This may require sudo privileges. Please enter your password if prompted."
-  audit_log "PACKAGE_INSTALL" "STARTED" "Packages: $MISSING_DEPS"
-  
-  # Install with proper quoting to prevent injection
-  if sudo apt-get update 2>/dev/null; then
-    if sudo apt-get install -y $MISSING_DEPS 2>/dev/null; then
-      echo "✅ Dependencies installed successfully"
-      audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: $MISSING_DEPS"
+  if [ ${#VALIDATED_DEPS[@]} -eq 0 ]; then
+    echo "ℹ️ No validated dependencies to install."
+  else
+    echo "🔄 Installing system dependencies (validated): ${VALIDATED_DEPS[*]}"
+    echo "📝 This may require sudo privileges. Please enter your password if prompted."
+    audit_log "PACKAGE_INSTALL" "STARTED" "Packages: ${VALIDATED_DEPS[*]}"
+    
+    # Install with proper array expansion to prevent injection
+    if sudo apt-get update 2>/dev/null; then
+      if sudo apt-get install -y "${VALIDATED_DEPS[@]}" 2>/dev/null; then
+        echo "✅ Dependencies installed successfully"
+        audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: ${VALIDATED_DEPS[*]}"
+      else
+        echo "❌ Failed to install dependencies. Please install manually:"
+        echo "   sudo apt-get update && sudo apt-get install -y ${VALIDATED_DEPS[*]}"
+        audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: ${VALIDATED_DEPS[*]}"
+        exit 1
+      fi
     else
-      echo "❌ Failed to install dependencies. Please install manually:"
-      echo "   sudo apt-get update && sudo apt-get install -y $MISSING_DEPS"
-      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: $MISSING_DEPS"
+      echo "❌ Failed to update package list"
+      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"
       exit 1
     fi
-  else
-    echo "❌ Failed to update package list"
-    audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"
-    exit 1
   fi

Fix a security vulnerability by using a bash array for the list of packages to
install, preventing command injection via word-splitting of the $MISSING_DEPS
variable.

scripts/initial-bootstrap.sh [70-121]

-# Function to validate package name against allowlist
-validate_package() {
-  local pkg="$1"
-  local allowed
-  for allowed in "${ALLOWED_PACKAGES[@]}"; do
-    if [ "$pkg" = "$allowed" ]; then
-      return 0
+# Collect validated packages into an array to avoid word-splitting/injection
+declare -a VALIDATED_DEPS=()
+if [ -n "$MISSING_DEPS" ]; then
+  echo "⚠️  Missing dependencies:$MISSING_CMDS"
+  for pkg in $MISSING_DEPS; do
+    if validate_package "$pkg"; then
+      VALIDATED_DEPS+=("$pkg")
+    else
+      echo "❌ SECURITY ERROR: Package '$pkg' not in approved installation list"
+      audit_log "PACKAGE_INSTALL" "BLOCKED" "Attempted to install non-allowlisted package: $pkg"
+      exit 1
     fi
   done
-  return 1
-}
-...
-if [ -n "$MISSING_DEPS" ]; then
-  echo "⚠️  Missing dependencies:$MISSING_CMDS"
-  ...
-  if sudo apt-get update 2>/dev/null; then
-    if sudo apt-get install -y $MISSING_DEPS 2>/dev/null; then
-      echo "✅ Dependencies installed successfully"
-      ...
 
+  if [ ${#VALIDATED_DEPS[@]} -eq 0 ]; then
+    echo "ℹ️ No validated dependencies to install."
+  else
+    echo "🔄 Installing system dependencies (validated): ${VALIDATED_DEPS[*]}"
+    echo "📝 This may require sudo privileges. Please enter your password if prompted."
+    audit_log "PACKAGE_INSTALL" "STARTED" "Packages: ${VALIDATED_DEPS[*]}"
+    if sudo apt-get update 2>/dev/null; then
+      if sudo apt-get install -y "${VALIDATED_DEPS[@]}" 2>/dev/null; then
+        echo "✅ Dependencies installed successfully"
+        audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: ${VALIDATED_DEPS[*]}"
+      else
+        echo "❌ Failed to install dependencies. Please install manually:"
+        echo "   sudo apt-get update && sudo apt-get install -y ${VALIDATED_DEPS[*]}"
+        audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: ${VALIDATED_DEPS[*]}"
+        exit 1
+      fi
+    else
+      echo "❌ Failed to update package list"
+      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"
+      exit 1
+    fi
+  fi
+else
+  echo "✅ All system dependencies found"
+  audit_log "DEPENDENCIES_CHECK" "SUCCESS" "All required dependencies present"
+fi
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: This suggestion identifies a critical security vulnerability where the package allowlist can be bypassed due to unquoted variable expansion, and provides a correct fix using an array to prevent command injection.

High
Centralize fallback token via env var

Centralize the hardcoded fallback authentication token by using an environment
variable to avoid duplication and improve maintainability.

scripts/initial-bootstrap.sh [11]

-AUTH_TOKEN = 'fallback-token-12345'; // Match fallback server default
+# In scripts/initial-bootstrap.sh (set a default only if not defined externally)
+FALLBACK_AUTH_TOKEN="${FALLBACK_AUTH_TOKEN:-fallback-token-12345}"
 
+# ... when starting fallback server:
+nohup python3 ./fallback-openai-server.py \
+  --port "$PORT" \
+  --model "$MODEL" \
+  --token "$FALLBACK_AUTH_TOKEN" >> "$LOG_FILE" 2>&1 &
+
+# In tests/setup/global-setup.ts (read from env with default)
+// const AUTH_TOKEN = 'fallback-token-12345';
+const AUTH_TOKEN = process.env.FALLBACK_AUTH_TOKEN || 'fallback-token-12345';
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that a hardcoded token is duplicated and proposes centralizing it via an environment variable, which improves maintainability and is good practice.

Low
Incremental [*]
Remove auth from health check

Remove the Authorization header from the fetch request inside the
checkPortHealth function to prevent potential authentication-related failures
during health checks.

tests/utils/model-utils.ts [83-94]

 async function checkPortHealth(port: number): Promise<boolean> {
   try {
     const response = await fetch(`http://localhost:${port}/health`, {
       signal: AbortSignal.timeout(5000),
-      headers: {
-        Authorization: `Bearer ${AUTH_TOKEN}`,
-      },
     });
     return response.ok;
-  } catch (error) {
+  } catch {
     return false;
   }
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that health check endpoints typically do not require authentication, and sending an Authorization header could cause false negatives, improving test reliability.

Medium
Remove hardcoded auth token
Suggestion Impact:The commit replaced the hardcoded "fallback-token-12345" with a token read from the FALLBACK_AUTH_TOKEN environment variable (with a default), and used it when starting the fallback server.

code diff:

@@ -340,7 +347,9 @@
 
     if [ "$FORCE_FALLBACK" -eq 1 ]; then
       echo "⚠️  Low-memory CPU-only system detected. Starting lightweight fallback server instead of vLLM."
-      nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" --token "fallback-token-12345" >> "$LOG_FILE" 2>&1 &
+      # Use environment variable for token, fallback to default if not set
+      FALLBACK_TOKEN="${FALLBACK_AUTH_TOKEN:-fallback-token-12345}"
+      nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" --token "$FALLBACK_TOKEN" >> "$LOG_FILE" 2>&1 &
       FALLBACK_PID=$!
       echo "ℹ️  Fallback server PID: $FALLBACK_PID"
       wait $FALLBACK_PID
@@ -380,7 +389,9 @@
       if kill -0 "$VLLM_PID" 2>/dev/null; then
         kill "$VLLM_PID" 2>/dev/null || true
       fi
-      nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" --token "fallback-token-12345" >> "$LOG_FILE" 2>&1 &
+      # Use environment variable for token, fallback to default if not set
+      FALLBACK_TOKEN="${FALLBACK_AUTH_TOKEN:-fallback-token-12345}"
+      nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" --token "$FALLBACK_TOKEN" >> "$LOG_FILE" 2>&1 &
       FALLBACK_PID=$!
       echo "ℹ️  Fallback server PID: $FALLBACK_PID"
       wait $FALLBACK_PID

Avoid hardcoding the authentication token. Instead, read the token from an
environment variable to prevent security risks.

scripts/initial-bootstrap.sh [343]

-AUTH_TOKEN = 'fallback-token-12345'; // Match fallback server default
+# Read auth token from environment (do not hardcode)
+FALLBACK_AUTH_TOKEN="${FALLBACK_AUTH_TOKEN:-}"
+if [ -z "$FALLBACK_AUTH_TOKEN" ]; then
+  echo "ℹ️ No FALLBACK_AUTH_TOKEN set; secure endpoints may require auth."
+  # For truly local/dev-only fallback you may uncomment the next line,
+  # but avoid committing real tokens.
+  # FALLBACK_AUTH_TOKEN="fallback-token-12345"
+fi
 
+...
+
+if [ "$FORCE_FALLBACK" -eq 1 ]; then
+  echo "⚠️  Low-memory CPU-only system detected. Starting lightweight fallback server instead of vLLM."
+  nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" ${FALLBACK_AUTH_TOKEN:+--token "$FALLBACK_AUTH_TOKEN"} >> "$LOG_FILE" 2>&1 &
+  FALLBACK_PID=$!
+  echo "ℹ️  Fallback server PID: $FALLBACK_PID"
+  wait $FALLBACK_PID
+  exit $?
+fi
+
+...
+
+if [ "$READY" -eq 1 ] && kill -0 "$VLLM_PID" 2>/dev/null; then
+  echo "✅ vLLM is ready on port $PORT (PID: $VLLM_PID). Attaching..."
+  wait "$VLLM_PID"
+  exit $?
+else
+  echo "⚠️  vLLM failed to start or respond in time. Falling back to lightweight server."
+  if kill -0 "$VLLM_PID" 2>/dev/null; then
+    kill "$VLLM_PID" 2>/dev/null || true
+  fi
+  nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" ${FALLBACK_AUTH_TOKEN:+--token "$FALLBACK_AUTH_TOKEN"} >> "$LOG_FILE" 2>&1 &
+  FALLBACK_PID=$!
+  echo "ℹ️  Fallback server PID: $FALLBACK_PID"
+  wait $FALLBACK_PID
+  exit $?
+fi
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a security risk by pointing out the hardcoded fallback-token-12345 and proposes a best-practice solution using environment variables, which improves security and flexibility.

Medium
Externalize auth token
Suggestion Impact:The hardcoded token was replaced with reading from process.env.FALLBACK_AUTH_TOKEN with a fallback to the original value.

code diff:

 // Authentication token for fallback server (matches fallback server config)
-const AUTH_TOKEN = 'fallback-token-12345';
+const AUTH_TOKEN = process.env.FALLBACK_AUTH_TOKEN ?? 'fallback-token-12345';

Replace the hardcoded AUTH_TOKEN with an environment variable
(process.env.FALLBACK_AUTH_TOKEN), using the current value as a fallback to
avoid leaking secrets and improve configuration.

tests/e2e/ide-integration.spec.ts [13-14]

 // Authentication token for fallback server (matches fallback server config)
-const AUTH_TOKEN = 'fallback-token-12345';
+const AUTH_TOKEN = process.env.FALLBACK_AUTH_TOKEN ?? 'fallback-token-12345';

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a hardcoded token and proposes using an environment variable, which is a security and configuration best practice, making the tests more portable.

Medium
Fix unsafe arg expansion
Suggestion Impact:The commit changed installation to use a validated array (VALIDATED_DEPS) with proper array expansion ("${VALIDATED_DEPS[@]}"), addressing unsafe arg expansion. However, it still suppresses stderr with 2>/dev/null, so only part of the suggestion was implemented.

code diff:

+  # Collect validated packages into array to prevent word-splitting/injection
+  declare -a VALIDATED_DEPS=()
   for pkg in $MISSING_DEPS; do
-    if ! validate_package "$pkg"; then
+    if validate_package "$pkg"; then
+      VALIDATED_DEPS+=("$pkg")
+    else
       echo "❌ SECURITY ERROR: Package '$pkg' not in approved installation list"
       audit_log "PACKAGE_INSTALL" "BLOCKED" "Attempted to install non-allowlisted package: $pkg"
       exit 1
     fi
   done
   
-  echo "🔄 Installing system dependencies (validated)..."
-  echo "📝 This may require sudo privileges. Please enter your password if prompted."
-  audit_log "PACKAGE_INSTALL" "STARTED" "Packages: $MISSING_DEPS"
-  
-  # Install with proper quoting to prevent injection
-  if sudo apt-get update 2>/dev/null; then
-    if sudo apt-get install -y $MISSING_DEPS 2>/dev/null; then
-      echo "✅ Dependencies installed successfully"
-      audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: $MISSING_DEPS"
+  if [ ${#VALIDATED_DEPS[@]} -eq 0 ]; then
+    echo "ℹ️ No validated dependencies to install."
+  else
+    echo "🔄 Installing system dependencies (validated): ${VALIDATED_DEPS[*]}"
+    echo "📝 This may require sudo privileges. Please enter your password if prompted."
+    audit_log "PACKAGE_INSTALL" "STARTED" "Packages: ${VALIDATED_DEPS[*]}"
+    
+    # Install with proper array expansion to prevent injection
+    if sudo apt-get update 2>/dev/null; then
+      if sudo apt-get install -y "${VALIDATED_DEPS[@]}" 2>/dev/null; then
+        echo "✅ Dependencies installed successfully"
+        audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: ${VALIDATED_DEPS[*]}"
+      else
+        echo "❌ Failed to install dependencies. Please install manually:"
+        echo "   sudo apt-get update && sudo apt-get install -y ${VALIDATED_DEPS[*]}"
+        audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: ${VALIDATED_DEPS[*]}"
+        exit 1
+      fi
     else
-      echo "❌ Failed to install dependencies. Please install manually:"
-      echo "   sudo apt-get update && sudo apt-get install -y $MISSING_DEPS"
-      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: $MISSING_DEPS"
+      echo "❌ Failed to update package list"
+      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"

Fix an unsafe argument expansion by quoting the $MISSING_DEPS variable to
prevent word splitting and potential errors during package installation. Also,
avoid suppressing stderr to provide users with actionable error messages.

scripts/initial-bootstrap.sh [118-132]

-if sudo apt-get update 2>/dev/null; then
-  if sudo apt-get install -y $MISSING_DEPS 2>/dev/null; then
+# Convert missing deps string to array safely
+read -r -a MISSING_DEPS_ARR <<< "$MISSING_DEPS"
+
+echo "🔄 Installing system dependencies (validated): ${MISSING_DEPS_ARR[*]}"
+echo "📝 This may require sudo privileges. Please enter your password if prompted."
+audit_log "PACKAGE_INSTALL" "STARTED" "Packages: ${MISSING_DEPS_ARR[*]}"
+
+if sudo apt-get update; then
+  if sudo apt-get install -y "${MISSING_DEPS_ARR[@]}"; then
     echo "✅ Dependencies installed successfully"
-    audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: $MISSING_DEPS"
-  else
+    audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: ${MISSING_DEPS_ARR[*]}"
+  } else
     echo "❌ Failed to install dependencies. Please install manually:"
-    echo "   sudo apt-get update && sudo apt-get install -y $MISSING_DEPS"
-    audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: $MISSING_DEPS"
+    echo "   sudo apt-get update && sudo apt-get install -y ${MISSING_DEPS_ARR[*]}"
+    audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: ${MISSING_DEPS_ARR[*]}"
     exit 1
   fi
 else
   echo "❌ Failed to update package list"
   audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"
   exit 1
 fi

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that using an unquoted variable $MISSING_DEPS for apt-get install is unsafe and can lead to word-splitting issues. It also correctly points out that redirecting stderr to /dev/null hides important error messages from the user.

Medium
General
Probe full API readiness before tests

Enhance the isPortReady function to validate not only the /health endpoint but
also the /v1/models and /v1/chat/completions endpoints to ensure the server is
fully operational before tests run.

tests/e2e/cli-chat-1b.spec.ts [102-122]

 async function isPortReady(port: number, timeout = 5000): Promise<boolean> {
   const startTime = Date.now();
+  let delay = 500;
+
+  async function ok(url: string, body?: unknown): Promise<boolean> {
+    try {
+      const resp = await fetch(url, {
+        method: body ? 'POST' : 'GET',
+        headers: {
+          'Content-Type': body ? 'application/json' : undefined,
+          Authorization: `Bearer ${AUTH_TOKEN}`,
+        } as Record<string, string>,
+        body: body ? JSON.stringify(body) : undefined,
+        signal: AbortSignal.timeout(1500),
+      });
+      if (!resp.ok) return false;
+      if (url.endsWith('/v1/models')) {
+        const data = (await resp.json()) as unknown;
+        return typeof data === 'object' && data !== null && 'data' in data;
+      }
+      if (url.endsWith('/v1/chat/completions')) {
+        const data = (await resp.json()) as unknown;
+        return (
+          typeof data === 'object' &&
+          data !== null &&
+          'choices' in data &&
+          Array.isArray((data as any).choices) &&
+          (data as any).choices.length > 0
+        );
+      }
+      return true; // health
+    } catch {
+      return false;
+    }
+  }
+
   while (Date.now() - startTime < timeout) {
-    try {
-      const response = await fetch(`http://localhost:${port}/health`, {
-        method: 'GET',
-        headers: {
-          Authorization: `Bearer ${AUTH_TOKEN}`,
-        },
-        signal: AbortSignal.timeout(2000),
-      });
-      if (response.ok) {
-        return true;
-      }
-    } catch {
-      // Connection failed, retry
+    const base = `http://localhost:${port}`;
+    const healthOk = await ok(`${base}/health`);
+    if (!healthOk) {
+      await new Promise((r) => setTimeout(r, delay));
+      delay = Math.min(delay * 2, 3000);
+      continue;
     }
-    await new Promise((resolve) => setTimeout(resolve, 1000));
+
+    const modelsOk = await ok(`${base}/v1/models`);
+    if (!modelsOk) {
+      await new Promise((r) => setTimeout(r, delay));
+      delay = Math.min(delay * 2, 3000);
+      continue;
+    }
+
+    const chatOk = await ok(`${base}/v1/chat/completions`, {
+      model: 'default',
+      messages: [{ role: 'user', content: 'ping' }],
+      max_tokens: 1,
+      temperature: 0.1,
+    });
+    if (chatOk) return true;
+
+    await new Promise((r) => setTimeout(r, delay));
+    delay = Math.min(delay * 2, 3000);
   }
   return false;
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: This is a valuable suggestion that improves test stability by ensuring the API is fully ready, not just the health endpoint. Checking the /v1/models and /v1/chat/completions endpoints before proceeding prevents race conditions and makes the tests more reliable, which is a critical improvement.

Medium
Possible issue
Ensure robust port/process cleanup

Improve the cleanupExistingProcesses function to be more robust by attempting a
graceful shutdown, killing processes by name, and finally killing any processes
listening on the test port ranges using lsof or ss.

tests/utils/model-utils.ts [118-127]

 async function cleanupExistingProcesses(): Promise<void> {
   try {
-    execSync('pkill -f "vllm.entrypoints.openai.api_server" || true', {
-      stdio: 'pipe',
-    });
+    // Attempt graceful shutdown via known port ranges
+    const portRanges = [
+      [8100, 8299],
+      [8300, 8499],
+      [8500, 8699],
+      [8700, 8899],
+    ];
+
+    for (const [start, end] of portRanges) {
+      for (let port = start; port <= end; port++) {
+        try {
+          // Try calling shutdown if supported (ignore errors)
+          await fetch(`http://localhost:${port}/shutdown`, {
+            method: 'POST',
+            headers: { Authorization: `Bearer ${AUTH_TOKEN}` },
+            signal: AbortSignal.timeout(1000),
+          }).catch(() => {});
+        } catch {
+          // ignore
+        }
+      }
+    }
+
+    // Kill known vLLM processes
+    execSync('pkill -f "vllm.entrypoints.openai.api_server" || true', { stdio: 'pipe' });
+
+    // As a last resort, find and kill pids listening on 8100-8899
+    try {
+      // Linux/macOS using lsof
+      execSync(
+        'lsof -iTCP -sTCP:LISTEN -nP | egrep ":81[0-9]{2}|:82[0-9]{2}|:83[0-9]{2}|:84[0-9]{2}|:85[0-9]{2}|:86[0-9]{2}|:87[0-9]{2}|:88[0-9]{2}" | awk \'{print $2}\' | xargs -r kill -9',
+        { stdio: 'pipe', shell: '/bin/bash' }
+      );
+    } catch {
+      // Fallback to ss if lsof unavailable
+      try {
+        execSync(
+          'ss -ltnp | egrep ":81[0-9]{2}|:82[0-9]{2}|:83[0-9]{2}|:84[0-9]{2}|:85[0-9]{2}|:86[0-9]{2}|:87[0-9]{2}|:88[0-9]{2}" | sed -E \'s/.*pid=([0-9]+).*/\\1/\' | xargs -r kill -9',
+          { stdio: 'pipe', shell: '/bin/bash' }
+        );
+      } catch {
+        // ignore if neither available
+      }
+    }
+
     await new Promise((resolve) => setTimeout(resolve, 2000));
+    console.log('✅ Cleanup completed');
   } catch (error) {
     const errorMessage = error instanceof Error ? error.message : String(error);
-    console.warn(`Failed to kill existing processes: ${errorMessage}`);
+    console.warn(`Failed to fully clean processes: ${errorMessage}`);
   }
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion improves the reliability of the test suite by making the process cleanup more robust. Killing processes by port in addition to the process name helps prevent flaky tests caused by lingering servers, which is a significant enhancement for test stability.

Medium
Ensure reliable vLLM termination before fallback

Improve process management by launching vLLM in a new process group for reliable
termination. Also, enhance the health check to validate for a specific HTTP 200
status code.

scripts/initial-bootstrap.sh [351-388]

-nohup python3 -m vllm.entrypoints.openai.api_server \
+# Launch vLLM in its own process group so we can terminate reliably
+nohup setsid python3 -m vllm.entrypoints.openai.api_server \
   --model "$MODEL" \
   --port "$PORT" \
   --gpu-memory-utilization "$UTIL" \
   $CHAT_TEMPLATE \
   >> "$LOG_FILE" 2>&1 &
 VLLM_PID=$!
+VLLM_PGID=$(ps -o pgid= -p "$VLLM_PID" 2>/dev/null | tr -d ' ')
 
 echo "⏳ Waiting for vLLM to become ready (up to 180s)..."
 READY=0
 for i in $(seq 1 180); do
-  if curl -s "http://localhost:$PORT/health" > /dev/null 2>&1; then
+  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:$PORT/health" || true)
+  if [ "$HTTP_CODE" = "200" ]; then
     READY=1
     break
   fi
   if ! kill -0 "$VLLM_PID" 2>/dev/null; then
     # vLLM process exited early
     break
   fi
   sleep 1
 done
 
 if [ "$READY" -eq 1 ] && kill -0 "$VLLM_PID" 2>/dev/null; then
   echo "✅ vLLM is ready on port $PORT (PID: $VLLM_PID). Attaching..."
   wait "$VLLM_PID"
   exit $?
 else
   echo "⚠️  vLLM failed to start or respond in time. Falling back to lightweight server."
-  # Ensure any lingering vLLM process is terminated
-  if kill -0 "$VLLM_PID" 2>/dev/null; then
+  # Ensure any lingering vLLM process tree is terminated
+  if [ -n "$VLLM_PGID" ] && [ "$VLLM_PGID" -gt 1 ] 2>/dev/null; then
+    kill -TERM -"${VLLM_PGID}" 2>/dev/null || true
+    sleep 2
+    # Force kill if still alive
+    if kill -0 "$VLLM_PID" 2>/dev/null; then
+      kill -KILL -"${VLLM_PGID}" 2>/dev/null || true
+    fi
+  elif kill -0 "$VLLM_PID" 2>/dev/null; then
     kill "$VLLM_PID" 2>/dev/null || true
   fi
+
   nohup python3 ./fallback-openai-server.py --port "$PORT" --model "$MODEL" --token "fallback-token-12345" >> "$LOG_FILE" 2>&1 &
   FALLBACK_PID=$!
   echo "ℹ️  Fallback server PID: $FALLBACK_PID"
   wait $FALLBACK_PID
   exit $?
 fi

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a risk of orphaned processes and proposes a robust solution using process groups (setsid) and a more reliable termination sequence (SIGTERM/SIGKILL), significantly improving the script's reliability.

Medium
Correct SSE streaming detection and parsing

Update the streaming response check to look for the correct text/event-stream
content type and implement more robust parsing of Server-Sent Events (SSE) data
lines.

tests/e2e/ide-integration.spec.ts [409-457]

-const contentType = response.headers.get('content-type');
+const contentType = response.headers.get('content-type')?.toLowerCase() ?? '';
 
-if (contentType && contentType.includes('text/plain')) {
-  // Handle streaming response
+if (contentType.includes('text/event-stream')) {
+  // Properly handle SSE streaming response
   const reader = response.body?.getReader();
-  let chunks = 0;
-  
+  let sseBuffer = '';
+  let dataLineCount = 0;
+  let sawDone = false;
+
   if (reader) {
-    while (chunks < 5) { // Read first few chunks
-      const { done } = await reader.read();
+    // Read a few chunks or until DONE
+    for (let i = 0; i < 10; i++) {
+      const { done, value } = await reader.read();
       if (done) break;
-      chunks++;
+      sseBuffer += new TextDecoder().decode(value);
+
+      const lines = sseBuffer.split('\n');
+      // keep last partial line in buffer
+      sseBuffer = lines.pop() || '';
+
+      for (const line of lines) {
+        const trimmed = line.trim();
+        if (!trimmed) continue;
+        if (trimmed === 'data: [DONE]') {
+          sawDone = true;
+          break;
+        }
+        if (trimmed.startsWith('data: ')) {
+          dataLineCount++;
+        }
+      }
+      if (sawDone) break;
     }
     reader.releaseLock();
   }
-  
-  expect(chunks, 'No streaming chunks received').toBeGreaterThan(0);
+
+  expect(dataLineCount, 'No SSE data lines received').toBeGreaterThan(0);
+  // We don't hard-require DONE in case server truncates, but if present, assert we saw it
+  if (sseBuffer.includes('data: [DONE]') || sawDone) {
+    expect(true).toBe(true);
+  }
 } else {
   // Fall back to regular response validation
   const data = (await response.json()) as unknown;
+  // Type-safe validation...
   ...
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the standard content-type for Server-Sent Events is text/event-stream, not text/plain. It also proposes a more robust parsing logic for the stream, which improves the test's accuracy and reliability.

Low
  • Update

Previous suggestions

✅ Suggestions up to commit fb2890d
CategorySuggestion                                                                                                                                    Impact
Security
Prevent shell injection in curl
Suggestion Impact:The code was changed to use spawn with argument array and stdin (@-) for the JSON payload instead of constructing a shell command string, mitigating shell injection risk.

code diff:

+    const command = `curl`;
+    const args = [
+      '-s',
+      '-X',
+      'POST',
+      `http://localhost:${FAST_TIER_PORT}/v1/chat/completions`,
+      '-H',
+      'Content-Type: application/json',
+      '-d',
+      '@-',
+    ];
+
+    const { stdout, stderr } = await new Promise<{ stdout: string; stderr: string }>((resolve, reject) => {
+      const child = spawn(command, args, { timeout: CHAT_TIMEOUT });
+      let stdout = '';
+      let stderr = '';
+      child.stdout.on('data', (data) => (stdout += data));
+      child.stderr.on('data', (data) => (stderr += data));
+      child.on('close', (code) => {
+        if (code === 0) {
+          resolve({ stdout, stderr });
+        } else {
+          reject(new Error(`Process exited with code ${code}: ${stderr}`));
+        }
+      });
+      child.on('error', (err) => reject(err));
+      child.stdin.write(JSON.stringify(payload));
+      child.stdin.end();
+    });

Refactor the curl command to use spawn and pass the JSON payload via stdin to
prevent potential shell injection vulnerabilities.

tests/e2e/cli-chat-1b.spec.ts [33-37]

-const command = `curl -s -X POST http://localhost:${FAST_TIER_PORT}/v1/chat/completions \\
-  -H "Content-Type: application/json" \\
-  -d '${JSON.stringify(payload)}'`;
+const command = `curl`;
+const args = [
+  '-s',
+  '-X',
+  'POST',
+  `http://localhost:${FAST_TIER_PORT}/v1/chat/completions`,
+  '-H',
+  'Content-Type: application/json',
+  '-d',
+  '@-',
+];
 
-const { stdout, stderr } = await execAsync(command, { timeout: CHAT_TIMEOUT });
+const { stdout, stderr } = await new Promise<{ stdout: string; stderr: string }>((resolve, reject) => {
+  const child = spawn(command, args, { timeout: CHAT_TIMEOUT });
+  let stdout = '';
+  let stderr = '';
+  child.stdout.on('data', (data) => (stdout += data));
+  child.stderr.on('data', (data) => (stderr += data));
+  child.on('close', (code) => {
+    if (code === 0) {
+      resolve({ stdout, stderr });
+    } else {
+      reject(new Error(`Process exited with code ${code}: ${stderr}`));
+    }
+  });
+  child.on('error', (err) => reject(err));
+  child.stdin.write(JSON.stringify(payload));
+  child.stdin.end();
+});

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a shell injection vulnerability and proposes a robust, standard fix using spawn and stdin, which is a critical security best practice.

High
High-level
Avoid embedding Python code in shell scripts

Instead of embedding the fallback-openai-server.py script within
initial-bootstrap.sh using a heredoc, create it as a standalone file. The
bootstrap script should then copy this file to the target directory, improving
maintainability.

Examples:

scripts/initial-bootstrap.sh [436-530]
# --- fallback-openai-server.py ---
FALLBACK_SERVER_CONTENT=$(cat <<'EOF'
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import time
import argparse
from http.server import BaseHTTPRequestHandler, HTTPServer


 ... (clipped 85 lines)

Solution Walkthrough:

Before:

# scripts/initial-bootstrap.sh

# ... other heredocs ...

# --- fallback-openai-server.py ---
FALLBACK_SERVER_CONTENT=$(cat <<'EOF'
#!/usr/bin/env python3
import json
import time
from http.server import BaseHTTPRequestHandler, HTTPServer

class Handler(BaseHTTPRequestHandler):
    # ... ~80 lines of Python code for a web server ...

def main():
    # ...

if __name__ == "__main__":
    main()
EOF
)
write_if_missing_or_outdated "./fallback-openai-server.py" "$FALLBACK_SERVER_CONTENT"
chmod +x "./fallback-openai-server.py"

After:

# scripts/initial-bootstrap.sh

# ... other heredocs ...

# --- fallback-openai-server.py ---
# The Python script is now a separate file in the repository.
# The bootstrap script just copies it to the destination.
cp ./scripts/fallback-openai-server.py ./fallback-openai-server.py
chmod +x "./fallback-openai-server.py"

# --- new file: scripts/fallback-openai-server.py ---
#!/usr/bin/env python3
import json
import time
from http.server import BaseHTTPRequestHandler, HTTPServer

class Handler(BaseHTTPRequestHandler):
    # ... ~80 lines of Python code for a web server ...
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that embedding a non-trivial Python script inside a shell script is poor design, negatively impacting maintainability, readability, and tooling support.

Medium
Possible issue
Fix incorrect script paths in documentation
Suggestion Impact:The README commands were updated to use scripts/*.sh and ./scripts/... paths for initial setup, launching a model, and testing connection.

code diff:

 # 4. Run initial setup
-chmod +x *.sh
-./initial-bootstrap.sh
+chmod +x scripts/*.sh
+./scripts/initial-bootstrap.sh
 
 # 5. Launch a model
 source ~/torch-env/bin/activate
-./daily-bootstrap.sh qa
+./scripts/daily-bootstrap.sh qa
 
 # 6. Test connection
-./test-connection.sh 8500
+./scripts/test-connection.sh 8500

Correct the script paths in the README.md Quick Start guide to point to the
scripts/ subdirectory, preventing "file not found" errors.

docs-archive/2025-10-29_114041/root-docs/README.md [50-59]

 # 4. Run initial setup
-chmod +x *.sh
-./initial-bootstrap.sh
+chmod +x scripts/*.sh
+./scripts/initial-bootstrap.sh
 
 # 5. Launch a model
 source ~/torch-env/bin/activate
-./daily-bootstrap.sh qa
+./scripts/daily-bootstrap.sh qa
 
 # 6. Test connection
-./test-connection.sh 8500
+./scripts/test-connection.sh 8500

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the script paths in the Quick Start guide are wrong, which would cause commands to fail for new users, and provides the correct paths.

Medium
Remove incorrect command from documentation
Suggestion Impact:The commit removed the invalid npm command from the "Model Won't Launch" troubleshooting section, leaving only the PowerShell local runner with the -NoModel flag.

code diff:

 ### Model Won't Launch
+
 ```bash
 # Check if port already has model
 curl http://localhost:8100/health
@@ -365,6 +388,7 @@

</details>


___

**Remove the incorrect <code>npm run test:1b -- --no-model</code> command from the <br>troubleshooting guide, as the <code>--no-model</code> flag is not supported by the underlying <br><code>playwright</code> command.**

[docs-archive/2025-10-29_114041/root-docs/E2E-TESTING-COMPLETE.md [357-365]](https://github.com/Tiny-Walnut-Games/vLLM-Bootstrap/pull/5/files#diff-ab901a3791d427591f55b86a342fe4dd123759588de0c5d4f2856a12206027fdR357-R365)

```diff
 ### Model Won't Launch
 ```bash
 # Check if port already has model
 curl http://localhost:8100/health
 
-# Use --no-model flag to skip launch
+# Use the -NoModel flag with the local runner to skip launch
 .\tests\run-1b-tests-local.ps1 -NoModel
-npm run test:1b -- --no-model


`[Suggestion processed]`


<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The suggestion correctly identifies an invalid command in the troubleshooting documentation and proposes removing it, which prevents user confusion and errors.

</details></details></td><td align=center>Low

</td></tr><tr><td rowspan=2>General</td>
<td>



<details><summary>Correct inaccurate model size in configuration</summary>

___

**Correct the <code>[15B]</code> model tier in <code>models.conf</code> by replacing the inaccurately sized <br><code>deepseek-ai/DeepSeek-Coder-V2</code> (16B) with a true 15B model to avoid user <br>confusion and potential VRAM issues.**

[wiki/Model-Configuration.md [132-135]](https://github.com/Tiny-Walnut-Games/vLLM-Bootstrap/pull/5/files#diff-72468f7d380089637fd4353fde6941d92d0500a8d15b32016fa69147c191eb3cR132-R135)

```diff
 [15B]
 default = bigcode/starcoder2-15b
-alt1 = deepseek-ai/DeepSeek-Coder-V2
+alt1 = mistralai/Codestral-22B
 alt2 = mistralai/Codestral-15B
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out an inaccurate model size in the configuration documentation, which could mislead users about VRAM requirements, and proposes a valid correction.

Low
Use array for dependency management
Suggestion Impact:The commit changed dependency handling to collect validated packages into a bash array (VALIDATED_DEPS) and installed them using array expansion with apt-get, addressing the robustness concern of word splitting/injection.

code diff:

+  echo "⚠️  Missing dependencies:$MISSING_CMDS"
+  
+  # Collect validated packages into array to prevent word-splitting/injection
+  declare -a VALIDATED_DEPS=()
+  for pkg in $MISSING_DEPS; do
+    if validate_package "$pkg"; then
+      VALIDATED_DEPS+=("$pkg")
+    else
+      echo "❌ SECURITY ERROR: Package '$pkg' not in approved installation list"
+      audit_log "PACKAGE_INSTALL" "BLOCKED" "Attempted to install non-allowlisted package: $pkg"
+      exit 1
+    fi
+  done
+  
+  if [ ${#VALIDATED_DEPS[@]} -eq 0 ]; then
+    echo "ℹ️ No validated dependencies to install."
+  else
+    echo "🔄 Installing system dependencies (validated): ${VALIDATED_DEPS[*]}"
+    echo "📝 This may require sudo privileges. Please enter your password if prompted."
+    audit_log "PACKAGE_INSTALL" "STARTED" "Packages: ${VALIDATED_DEPS[*]}"
+    
+    # Install with proper array expansion to prevent injection
+    if sudo apt-get update 2>/dev/null; then
+      if sudo apt-get install -y "${VALIDATED_DEPS[@]}" 2>/dev/null; then
+        echo "✅ Dependencies installed successfully"
+        audit_log "PACKAGE_INSTALL" "SUCCESS" "Packages installed: ${VALIDATED_DEPS[*]}"
+      else
+        echo "❌ Failed to install dependencies. Please install manually:"
+        echo "   sudo apt-get update && sudo apt-get install -y ${VALIDATED_DEPS[*]}"
+        audit_log "PACKAGE_INSTALL" "FAILED" "apt-get install failed for: ${VALIDATED_DEPS[*]}"
+        exit 1
+      fi
+    else
+      echo "❌ Failed to update package list"
+      audit_log "PACKAGE_INSTALL" "FAILED" "apt-get update failed"
+      exit 1
+    fi
+  fi

Refactor the dependency management logic to use a bash array instead of a
space-separated string for improved robustness and clarity.

scripts/initial-bootstrap.sh [52-64]

-if [ -n "$MISSING_DEPS" ]; then
-  echo "⚠️  Missing dependencies:$MISSING_DEPS"
+if [ ${#MISSING_DEPS[@]} -gt 0 ]; then
+  echo "⚠️  Missing dependencies: ${MISSING_DEPS[*]}"
   echo "🔄 Installing missing system dependencies..."
   echo "📝 This may require sudo privileges. Please enter your password if prompted."
   sudo apt update
-  sudo apt install -y$MISSING_DEPS || {
+  sudo apt install -y "${MISSING_DEPS[@]}" || {
     echo "❌ Failed to install dependencies. Please install manually:"
-    echo "   sudo apt update && sudo apt install -y$MISSING_DEPS"
+    echo "   sudo apt update && sudo apt install -y ${MISSING_DEPS[*]}"
     exit 1
   }
 else
   echo "✅ All system dependencies found"
 fi
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly points out that using an array for the dependency list is more robust and a better practice than relying on shell word splitting of a string, improving code clarity and maintainability.

Low

jmeyer1980 and others added 2 commits October 29, 2025 19:55
…in permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…in permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
jmeyer1980 and others added 3 commits October 30, 2025 00:40
…` process

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@qodo-code-review
Copy link

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Test Summary

Failed stage: Comment on PR with results [❌]

Failure summary:

The action failed when actions/github-script attempted to call github.rest.issues.createComment to
post a comment to issue #5:
- GitHub API returned HttpError: Resource not accessible by integration
(lines 71, 126).
- Endpoint: POST
https://api.github.com/repos/Tiny-Walnut-Games/vLLM-Bootstrap/issues/5/comments (lines 111–118).
-
Cause: The GitHub token used by the workflow lacks sufficient permissions or the event context does
not grant the required access (e.g., using GITHUB_TOKEN on pull_request from a fork without
permissions: issues: write or without pull_request_target), so the integration cannot create an
issue comment.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

56:  with:
57:  script: github.rest.issues.createComment({
58:    issue_number: context.issue.number,
59:    owner: context.repo.owner,
60:    repo: context.repo.repo,
61:    body: '✅ E2E tests completed for all 4 model tiers on Linux. Check the test summary above.'
62:  })
63:  
64:  github-token: ***
65:  debug: false
66:  user-agent: actions/github-script
67:  result-encoding: json
68:  retries: 0
69:  retry-exempt-status-codes: 400,401,403,404,422
70:  ##[endgroup]
71:  RequestError [HttpError]: Resource not accessible by integration
72:  at /home/runner/work/_actions/actions/github-script/v7/dist/index.js:9537:21
...

111:  url: 'https://api.github.com/repos/Tiny-Walnut-Games/vLLM-Bootstrap/issues/5/comments',
112:  headers: {
113:  accept: 'application/vnd.github.v3+json',
114:  'user-agent': 'actions/github-script octokit-core.js/5.0.1 Node.js/20.19.5 (linux; x64)',
115:  authorization: 'token [REDACTED]',
116:  'content-type': 'application/json; charset=utf-8'
117:  },
118:  body: '{"body":"✅ E2E tests completed for all 4 model tiers on Linux. Check the test summary above."}',
119:  request: {
120:  agent: [Agent],
121:  fetch: [Function: proxyFetch],
122:  hook: [Function: bound bound register]
123:  }
124:  }
125:  }
126:  ##[error]Unhandled error: HttpError: Resource not accessible by integration
127:  Cleaning up orphan processes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request Review effort 4/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant