Replace BulkProcessor with custom RoutedBulkIndexer#1548
Conversation
- New RoutedBulkIndexer class with shard routing, retry with backoff, and per-indexer stats (docStats, queueStats, esBatchRequestStats) - Removes 8 BulkProcessors, 8 builders, 8 clients, VCFJsonBulkIndexer - Each batch routed to a random shard for sequential writes - Proper 429/failure retry instead of silent data loss - System.exit(-1) after max retries exhausted to ensure data integrity - Indexing flag read from config; pipeline runs in both modes - ~300 lines of boilerplate removed from SourceDocumentCreation
Code ReviewSummary: This PR replaces 8 hardcoded The changes look correct. A few observations: Shutdown sequence is soundThe Minor observations (non-blocking)
No correctness issues, no security concerns, no breaking API changes. LGTM. 🤖 Generated with Claude Code |
Remove unnecessary parens in RoutedBulkIndexer, set non-MGD species inactive in VariantFileSet.yaml for MGD-only run, and increase local ES max_content_length to 1g.
Summary
RoutedBulkIndexersupporting shard routing, retry with backoff, and per-indexer statsshardCount * 4indexer threads for maximum ES throughputSystem.exit(-1)after max retriesTest plan
indexing=false-- full pipeline runs correctlyindexing=trueagainst local ES -- balanced BP distribution, ~17-20K r/s combined