Compaction Strategy

Required reading: Write Buffer and Database Design

Why compaction?

StormDB appends all incoming new records to the end of WAL file. Since the WAL file may contain duplicated records for keys, StormDB needs to compact the database periodically to prevent data from over-growing and to have upper threshold of iterate performance(with other things kept constant)

Strategy

StormDB compacts the current snapshot of the DB. Hence, it first flushes in-memory buffer to WAL file and creates a new WAL file to which new put requests are inserted. Since iteration is non-contentious(except at start before iteration begins), it simply iterates through the old WAL file and the old data file. It pushes the corresponding data in buffered manner to data.next file while updating indices and other bookkeeping. Once complete, it gets rid of old files and makes WAL and data point to the new files.

Compaction procedure:

Flush the current WAL buffer
Create a new WAL file with an extension of .next
Point all new writes to go to the new WAL file created
Loop over the old WAL file and the data file to produce data.next
Move the old WAL and current data file to {}.del
Rename the new WAL file to the current WAL file (remove the .next extension)
Rename data.next to the data file
Delete the *.del (these represent the old WAL and data files)

Impact during compaction

One important consideration while designing compaction strategy is to enable parallel read/write/iterate without affecting performance.

Writes

StormDB achieves same performance by creating a new WAL file during compaction. Thus writes are able to clock same performance as earlier.

Iterate

While compaction is in progress, StormDB iterates through new WAL first. Then it iterates old WAL file and data file as usual. Overall, no significant impact on performance as the total size of data iterated is constant.

Reads

Any read request thus can be one amongst new WAL file, new data file, old WAL file and old data file(in this order for latest data). For this, it maintains bitset during compaction to know the correct file. Hence, random lookup in file is once.

Auto compaction strategy

StormDB supports auto-compaction strategy that can be optionally turned off. When turned on, it triggers an automatic compaction once the WAL file size exceeds 10% of data file size. In order to control frequent auto-compaction for smaller DBs, another threshold of 8 times the write buffer size (8 X ~4MB = ~32 MB) is applied. In essence, StormDB strives to have not more than 10% of data in WAL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compaction Strategy

Why compaction?

Strategy

Compaction procedure:

Impact during compaction

Writes

Iterate

Reads

Auto compaction strategy

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally