As an important milestone for seq-db, we plan to introduce compaction into the database.
We expect several benefits from compaction:
- It will allow us to reduce the fraction size to very small values (or even seal a fraction for each bulk), significantly lowering the memory footprint. We also anticipate an increase in ingestion throughput, since smaller fractions should reduce contention;
- We expect lower on-disk usage for fractions. For example, the tokens section of the
.index file (which accounts for around 20% of a fraction’s .index size) often contains overlapping (field, value) tuples across multiple fractions. When fractions are merged, these sections become more space-efficient to store by merging duplicate tuples;
- Compaction will enable us to implement partitioning. We expect partitions (and parts) to be created upon sealing, which may produce many tiny sealed fractions — an inefficient outcome. Background compaction will address this by merging small parts into larger, more efficient ones;
- Also, if we stick with Time-Tiered Compaction Strategy (TWCS, DTCS) we can easily implement time-based retention.