The current single-threaded entity generation and persistence process becomes a performance bottleneck as data volume increases.
When testing with large-scale entity creation, the following results were observed:
| Instance Count |
Elapsed Time |
Result |
| 10,000 |
~4 seconds |
✅ Successful |
| 100,000 |
~21 seconds |
✅ Successful |
| 1,000,000 |
— |
❌ Application crashed due to memory exhaustion |
Problem
The existing implementation holds generated entities in memory before persistence, leading to excessive heap usage and eventual OutOfMemoryError at high scale.
This design cannot handle large datasets efficiently and lacks parallelism between entity creation and database writing.
Expected Improvements
- Enable large-scale entity generation exceeding 100 million+ records without memory exhaustion by using a streaming, producer–consumer architecture.
- Achieve near-linear scalability by separating CPU (generation) and I/O (persistence) workloads.
- Reduce total processing time significantly by parallelizing entity creation and batch persistence.