Skip to content

CRITICAL: Sidekiq worker crashes continuously - no jobs being processed #1944

@rileyseaburg

Description

@rileyseaburg

Problem

The touchpoints-production-sidekiq-worker app is crashing repeatedly (every ~16 minutes) and showing 0/1 instances running. This prevents ALL background jobs from processing, including:

  • Export jobs (form responses, events, versions, digital service accounts)
  • Email notifications
  • Scheduled tasks

Impact

  • No background jobs are being processed in production
  • Users requesting data exports never receive their files
  • Scheduled jobs via cf run-task may also be affected
  • Cloud Foundry is sending continuous "sidekiq worker failed" emails

Root Cause

The sidekiq worker app is configured to run bin/rails server (Rails web server) instead of bundle exec sidekiq.

Evidence

$ cf app touchpoints-production-sidekiq-worker
# instances: 0/1  # Worker is DOWN

$ cf events touchpoints-production-sidekiq-worker | head -5
# Shows continuous crashes with "APP/PROC/WEB: Exited with status 1"

$ cf curl /v3/apps/$(cf app touchpoints-production-sidekiq-worker --guid)/droplets/current
# Shows process type: "web":"bin/rails server -b 0.0.0.0 -p $PORT -e $RAILS_ENV"
# Should be: "worker":"bundle exec sidekiq"

Why This Causes Crashes

  1. The web server starts but expects HTTP traffic
  2. No route exists to send traffic to the sidekiq worker
  3. Without traffic, the process appears unhealthy to Cloud Foundry's health check
  4. CF kills and restarts it repeatedly
  5. Since there's no manifest or command override, the ruby buildpack defaults to rails server

Proposed Solution

Fix 1: Update Deploy Script (Immediate Fix - Recommended)

Modify .circleci/deploy-sidekiq.sh to pass the correct command:

# Line 131-133: Add -c flag with sidekiq command
if cf push "$app_name" \
  -t 180 \
  -c "bundle exec sidekiq -C config/sidekiq.yml" \
  --health-check-type process; then

This is the fastest fix with the smallest change surface.

Fix 2: Create Separate Sidekiq Manifests (Alternative)

Create manifest files for each environment:

  • touchpoints-production-sidekiq.yml
  • touchpoints-staging-sidekiq.yml
  • touchpoints-demo-sidekiq.yml

Each with:

applications:
  - name: touchpoints-production-sidekiq-worker
    command: bundle exec sidekiq -C config/sidekiq.yml
    memory: 4G
    # ... other configs

Fix 3: Create Procfile (Best Practice - Most Change)

Create Procfile with multiple process types:

web: bundle exec rails s -b 0.0.0.0 -p $PORT -e $RAILS_ENV
worker: bundle exec sidekiq -C config/sidekiq.yml

Then update manifests to use different process types.

Implementation Plan

Phase 1: Fix Production (URGENT)

  1. Update .circleci/deploy-sidekiq.sh to include sidekiq command
  2. Deploy to production
  3. Verify worker starts: cf app touchpoints-production-sidekiq-worker (should show 1/1)
  4. Check logs: cf logs touchpoints-production-sidekiq-worker --recent
  5. Verify job processing in Sidekiq Web UI at /admin/sidekiq

Phase 2: Fix Staging and Demo

  1. Same deployment script update applies to all environments
  2. Deploy to staging: touchpoints-staging-sidekiq-worker
  3. Deploy to demo: touchpoints-demo-sidekiq-worker
  4. Verify each environment

Phase 3: Improvements (Follow-up)

  • Add error handling and retry policies to export jobs
  • Configure monitoring for job failures
  • Add user-facing error notifications
  • Consider increasing concurrency if needed

Verification Checklist

  • Sidekiq worker shows instances: 1/1 (not 0/1)
  • Worker logs show "Sidekiq" startup message, not "Rails" server
  • Export jobs complete successfully
  • No more "sidekiq worker failed" emails
  • Sidekiq Web UI shows active workers processing jobs
  • Staging and demo workers also fixed

Current Status

  • Production: BROKEN (0/1 instances, continuous crashes)
  • Staging: Likely broken (same deploy script)
  • Demo: Likely broken (same deploy script)

Related Files

  • .circleci/deploy-sidekiq.sh - deployment script (needs -c flag)
  • config/sidekiq.yml - sidekiq configuration (concurrency: 1, queues: default, mailers)
  • app/jobs/ - all background jobs currently not processing

References

  • manifest.sample.yml line 12: Contains commented example of correct sidekiq command
  • config/initializers/vcap_services.rb - Sets up Redis connection from CF services
  • config/initializers/sidekiq.rb - Configures Sidekiq Redis connection

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions