Skip to content

Memory Exhaustion in get_feed_data Causes Application Crashes (Claude Code Review) #1941

@SueValente

Description

@SueValente

P0 Critical: Memory Exhaustion in get_feed_data Causes Application Crashes

Summary

The get_feed_data method in Admin::SubmissionsController loads all forms, submissions, and questions into memory simultaneously, causing out-of-memory (OOM) crashes during feed exports.

Priority: P0 - Critical
Component: app/controllers/admin/submissions_controller.rb
Lines: 238-267
Affected Endpoints: /admin/submissions/feed, /admin/submissions/export_feed


Problem Description

When users or scheduled jobs trigger the feed export functionality, the application attempts to load the entire dataset into memory before processing. This causes:

  • Application memory to spike to several GB
  • OOM kills in production
  • Sidekiq workers crashing during background exports
  • Degraded performance for all users during export operations

Reproduction Steps

  1. Navigate to Admin > Submissions > Feed
  2. Set days_limit to a large value (e.g., 30+ days)
  3. Click Export
  4. Observe memory spike and potential timeout/crash

Root Cause Analysis

Current Implementation

# app/controllers/admin/submissions_controller.rb:238-267

def get_feed_data(days_limit)
  all_question_responses = []

  Form.all.each do |form|                              # Problem 1: Loads ALL forms into memory
    submissions = form.submissions.ordered             # Problem 2: N+1 query per form
    submissions = submissions.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
    submissions.each do |submission|                   # Problem 3: Loads ALL submissions per form
      form.ordered_questions.each do |question|        # Problem 4: N+1 query per submission
        question_text = question.text.to_s
        answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
        @hash = {
          organization_id: form.organization_id,
          organization_name: form.organization.name,   # Problem 5: N+1 for organization
          form_id: form.id,
          form_name: form.name,
          submission_id: submission.id,
          question_id: question.id,
          user_id: submission.user_id,
          question_text:,
          response_text: answer_text,
          question_with_response_text: "#{question_text}: #{answer_text}",
          created_at: submission.created_at,
        }
        all_question_responses << @hash                # Problem 6: Unbounded array growth
      end
    end
  end

  all_question_responses                               # Problem 7: Returns massive array
end

Memory Impact Calculation

Metric Typical Value Memory Per Item Total
Forms 500 ~2 KB 1 MB
Submissions (30 days) 50,000 ~1 KB 50 MB
Questions 5,000 ~0.5 KB 2.5 MB
Result Hashes 500 × 50,000 × 10 = 250,000,000 ~0.5 KB 125 GB

Even with more conservative numbers (100 forms × 1,000 submissions × 10 questions), this creates 1,000,000 hash objects consuming hundreds of MB.

Issues Identified

  1. Form.all.each - Loads entire forms table into memory
  2. Triple-nested loops - O(forms × submissions × questions) complexity
  3. No batching - All records loaded before any processing
  4. N+1 queries - Missing eager loading for organization, questions
  5. Unbounded array - all_question_responses grows without limit
  6. Synchronous processing - Blocks request thread during entire operation

Proposed Solution

Option A: Batched Processing with find_each (Recommended)

# app/controllers/admin/submissions_controller.rb

def get_feed_data(days_limit)
  Enumerator.new do |yielder|
    # Batch forms with eager loading
    Form.includes(:organization, :questions)
        .find_each(batch_size: 100) do |form|

      # Build submissions query with date filter
      submissions_scope = form.submissions
      submissions_scope = submissions_scope.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?

      # Batch submissions
      submissions_scope.find_each(batch_size: 1000) do |submission|
        # Questions already eager loaded
        form.questions.each do |question|
          question_text = question.text.to_s
          answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)

          yielder << {
            organization_id: form.organization_id,
            organization_name: form.organization.name,
            form_id: form.id,
            form_name: form.name,
            submission_id: submission.id,
            question_id: question.id,
            user_id: submission.user_id,
            question_text: question_text,
            response_text: answer_text,
            question_with_response_text: "#{question_text}: #{answer_text}",
            created_at: submission.created_at,
          }
        end
      end
    end
  end
end

# Update export_feed to stream the response
def export_feed
  @days_limit = (params[:days_limit].present? ? params[:days_limit].to_i : 1)

  respond_to do |format|
    format.csv do
      headers['Content-Type'] = 'text/csv; charset=utf-8'
      headers['Content-Disposition'] = "attachment; filename=touchpoints-feed-#{Date.today}.csv"
      headers['X-Accel-Buffering'] = 'no'  # Disable nginx/proxy buffering
      headers['Cache-Control'] = 'no-cache'

      self.response_body = StreamingCsvExporter.new(get_feed_data(@days_limit))
    end

    format.json do
      # For JSON, consider pagination or background job for large datasets
      render json: get_feed_data(@days_limit).take(10_000).to_a
    end
  end
end

Supporting Class: StreamingCsvExporter

# app/services/streaming_csv_exporter.rb

class StreamingCsvExporter
  HEADERS = %w[
    organization_id organization_name form_id form_name submission_id
    question_id user_id question_text response_text
    question_with_response_text created_at
  ].freeze

  def initialize(enumerator)
    @enumerator = enumerator
  end

  def each
    yield CSV.generate_line(HEADERS)

    @enumerator.each do |row|
      yield CSV.generate_line(HEADERS.map { |h| row[h.to_sym] })
    end
  end
end

Option B: Background Job for Large Exports

For very large datasets, move to async processing:

# app/jobs/feed_export_job.rb

class FeedExportJob < ApplicationJob
  queue_as :exports

  def perform(user_email, days_limit)
    file_path = Rails.root.join('tmp', "feed-export-#{SecureRandom.uuid}.csv")

    CSV.open(file_path, 'wb') do |csv|
      csv << StreamingCsvExporter::HEADERS

      Form.includes(:organization, :questions).find_each(batch_size: 100) do |form|
        # ... batched processing, write directly to file
      end
    end

    # Upload to S3 and email user
    url = S3Uploader.upload(file_path)
    UserMailer.export_ready(user_email, url).deliver_later
  ensure
    FileUtils.rm_f(file_path)
  end
end

Expected Impact

Metric Before After Improvement
Peak Memory 2-4 GB 50-100 MB ~90% reduction
Memory Growth Unbounded Constant Stable under load
N+1 Queries O(forms × submissions) O(1) 99% fewer queries
Request Timeout Risk High Low Streaming prevents timeout
OOM Crash Risk High Minimal Batching prevents spikes

Testing Checklist

Unit Tests

  • get_feed_data returns Enumerator (not Array)
  • Enumerator yields correct hash structure
  • Empty forms/submissions handled gracefully
  • days_limit = 0 returns all submissions
  • days_limit > 0 filters correctly

Integration Tests

  • CSV export streams without loading all data
  • Response headers set correctly for streaming
  • Large dataset (10,000+ submissions) completes without OOM
  • JSON endpoint respects pagination/limits

Performance Tests

  • Memory usage stays below 200 MB during export
  • Export of 50,000 submissions completes in < 60 seconds
  • No N+1 queries in logs (check with Bullet gem)
  • Database connection pool not exhausted

Manual QA

  • CSV file downloads correctly in browser
  • CSV file opens in Excel without corruption
  • All expected columns present
  • Data matches database records
  • Special characters (UTF-8) handled correctly

Rollout Plan

  1. Phase 1: Deploy behind feature flag
  2. Phase 2: Enable for admin users only
  3. Phase 3: Monitor memory metrics for 48 hours
  4. Phase 4: Enable for all users
  5. Phase 5: Remove old implementation

Related Issues


References


Labels

priority:p0 type:bug area:performance area:memory component:submissions

001-memory-exhaustion-get-feed-data.md
002-stream-csv-exports.md
003-fix-n-plus-one-queries.md
004-batch-bulk-updates.md
005-cache-question-options.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions