-
Notifications
You must be signed in to change notification settings - Fork 34
Description
P0 Critical: Memory Exhaustion in get_feed_data Causes Application Crashes
Summary
The get_feed_data method in Admin::SubmissionsController loads all forms, submissions, and questions into memory simultaneously, causing out-of-memory (OOM) crashes during feed exports.
Priority: P0 - Critical
Component: app/controllers/admin/submissions_controller.rb
Lines: 238-267
Affected Endpoints: /admin/submissions/feed, /admin/submissions/export_feed
Problem Description
When users or scheduled jobs trigger the feed export functionality, the application attempts to load the entire dataset into memory before processing. This causes:
- Application memory to spike to several GB
- OOM kills in production
- Sidekiq workers crashing during background exports
- Degraded performance for all users during export operations
Reproduction Steps
- Navigate to Admin > Submissions > Feed
- Set
days_limitto a large value (e.g., 30+ days) - Click Export
- Observe memory spike and potential timeout/crash
Root Cause Analysis
Current Implementation
# app/controllers/admin/submissions_controller.rb:238-267
def get_feed_data(days_limit)
all_question_responses = []
Form.all.each do |form| # Problem 1: Loads ALL forms into memory
submissions = form.submissions.ordered # Problem 2: N+1 query per form
submissions = submissions.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
submissions.each do |submission| # Problem 3: Loads ALL submissions per form
form.ordered_questions.each do |question| # Problem 4: N+1 query per submission
question_text = question.text.to_s
answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
@hash = {
organization_id: form.organization_id,
organization_name: form.organization.name, # Problem 5: N+1 for organization
form_id: form.id,
form_name: form.name,
submission_id: submission.id,
question_id: question.id,
user_id: submission.user_id,
question_text:,
response_text: answer_text,
question_with_response_text: "#{question_text}: #{answer_text}",
created_at: submission.created_at,
}
all_question_responses << @hash # Problem 6: Unbounded array growth
end
end
end
all_question_responses # Problem 7: Returns massive array
endMemory Impact Calculation
| Metric | Typical Value | Memory Per Item | Total |
|---|---|---|---|
| Forms | 500 | ~2 KB | 1 MB |
| Submissions (30 days) | 50,000 | ~1 KB | 50 MB |
| Questions | 5,000 | ~0.5 KB | 2.5 MB |
| Result Hashes | 500 × 50,000 × 10 = 250,000,000 | ~0.5 KB | 125 GB |
Even with more conservative numbers (100 forms × 1,000 submissions × 10 questions), this creates 1,000,000 hash objects consuming hundreds of MB.
Issues Identified
Form.all.each- Loads entire forms table into memory- Triple-nested loops - O(forms × submissions × questions) complexity
- No batching - All records loaded before any processing
- N+1 queries - Missing eager loading for
organization,questions - Unbounded array -
all_question_responsesgrows without limit - Synchronous processing - Blocks request thread during entire operation
Proposed Solution
Option A: Batched Processing with find_each (Recommended)
# app/controllers/admin/submissions_controller.rb
def get_feed_data(days_limit)
Enumerator.new do |yielder|
# Batch forms with eager loading
Form.includes(:organization, :questions)
.find_each(batch_size: 100) do |form|
# Build submissions query with date filter
submissions_scope = form.submissions
submissions_scope = submissions_scope.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
# Batch submissions
submissions_scope.find_each(batch_size: 1000) do |submission|
# Questions already eager loaded
form.questions.each do |question|
question_text = question.text.to_s
answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
yielder << {
organization_id: form.organization_id,
organization_name: form.organization.name,
form_id: form.id,
form_name: form.name,
submission_id: submission.id,
question_id: question.id,
user_id: submission.user_id,
question_text: question_text,
response_text: answer_text,
question_with_response_text: "#{question_text}: #{answer_text}",
created_at: submission.created_at,
}
end
end
end
end
end
# Update export_feed to stream the response
def export_feed
@days_limit = (params[:days_limit].present? ? params[:days_limit].to_i : 1)
respond_to do |format|
format.csv do
headers['Content-Type'] = 'text/csv; charset=utf-8'
headers['Content-Disposition'] = "attachment; filename=touchpoints-feed-#{Date.today}.csv"
headers['X-Accel-Buffering'] = 'no' # Disable nginx/proxy buffering
headers['Cache-Control'] = 'no-cache'
self.response_body = StreamingCsvExporter.new(get_feed_data(@days_limit))
end
format.json do
# For JSON, consider pagination or background job for large datasets
render json: get_feed_data(@days_limit).take(10_000).to_a
end
end
endSupporting Class: StreamingCsvExporter
# app/services/streaming_csv_exporter.rb
class StreamingCsvExporter
HEADERS = %w[
organization_id organization_name form_id form_name submission_id
question_id user_id question_text response_text
question_with_response_text created_at
].freeze
def initialize(enumerator)
@enumerator = enumerator
end
def each
yield CSV.generate_line(HEADERS)
@enumerator.each do |row|
yield CSV.generate_line(HEADERS.map { |h| row[h.to_sym] })
end
end
endOption B: Background Job for Large Exports
For very large datasets, move to async processing:
# app/jobs/feed_export_job.rb
class FeedExportJob < ApplicationJob
queue_as :exports
def perform(user_email, days_limit)
file_path = Rails.root.join('tmp', "feed-export-#{SecureRandom.uuid}.csv")
CSV.open(file_path, 'wb') do |csv|
csv << StreamingCsvExporter::HEADERS
Form.includes(:organization, :questions).find_each(batch_size: 100) do |form|
# ... batched processing, write directly to file
end
end
# Upload to S3 and email user
url = S3Uploader.upload(file_path)
UserMailer.export_ready(user_email, url).deliver_later
ensure
FileUtils.rm_f(file_path)
end
endExpected Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| Peak Memory | 2-4 GB | 50-100 MB | ~90% reduction |
| Memory Growth | Unbounded | Constant | Stable under load |
| N+1 Queries | O(forms × submissions) | O(1) | 99% fewer queries |
| Request Timeout Risk | High | Low | Streaming prevents timeout |
| OOM Crash Risk | High | Minimal | Batching prevents spikes |
Testing Checklist
Unit Tests
-
get_feed_datareturns Enumerator (not Array) - Enumerator yields correct hash structure
- Empty forms/submissions handled gracefully
-
days_limit = 0returns all submissions -
days_limit > 0filters correctly
Integration Tests
- CSV export streams without loading all data
- Response headers set correctly for streaming
- Large dataset (10,000+ submissions) completes without OOM
- JSON endpoint respects pagination/limits
Performance Tests
- Memory usage stays below 200 MB during export
- Export of 50,000 submissions completes in < 60 seconds
- No N+1 queries in logs (check with Bullet gem)
- Database connection pool not exhausted
Manual QA
- CSV file downloads correctly in browser
- CSV file opens in Excel without corruption
- All expected columns present
- Data matches database records
- Special characters (UTF-8) handled correctly
Rollout Plan
- Phase 1: Deploy behind feature flag
- Phase 2: Enable for admin users only
- Phase 3: Monitor memory metrics for 48 hours
- Phase 4: Enable for all users
- Phase 5: Remove old implementation
Related Issues
- Update gems #2 - Stream CSV exports in
Form#to_csvmethods - Add Programs and Services for an Organization #3 - Add
.includes()to fix N+1 queries in forms index - Develop #4 - Batch bulk update operations in submissions controller
- Develop #5 - Cache question options in A11 export methods
References
- Rails
find_eachdocumentation - Streaming responses in Rails
- Memory profiling with
memory_profilergem
Labels
priority:p0 type:bug area:performance area:memory component:submissions
001-memory-exhaustion-get-feed-data.md
002-stream-csv-exports.md
003-fix-n-plus-one-queries.md
004-batch-bulk-updates.md
005-cache-question-options.md