-
Notifications
You must be signed in to change notification settings - Fork 5
Compaction Strategy
Required reading: Write Buffer and Database Design
StormDB appends all incoming new records to the end of WAL file. Since the WAL file may contain duplicated records for keys, StormDB needs to compact the database periodically to prevent data from over-growing and to have upper threshold of iterate performance(with other things kept constant)
StormDB compacts the current snapshot of the DB. Hence, it first flushes in-memory buffer to WAL file and creates a new WAL file to which new put requests are inserted. Since iteration is non-contentious(except at start before iteration begins), it simply iterates through the old WAL file and the old data file. It pushes the corresponding data in buffered manner to data.next file while updating indices and other bookkeeping. Once complete, it gets rid of old files and makes WAL and data point to the new files.
- Flush the current WAL buffer
- Create a new WAL file with an extension of
.next - Point all new writes to go to the new WAL file created
- Loop over the old WAL file and the data file to produce
data.next - Move the old WAL and current data file to
{}.del - Rename the new WAL file to the current WAL file (remove the
.nextextension) - Rename
data.nextto the data file - Delete the
*.del(these represent the old WAL and data files)
One important consideration while designing compaction strategy is to enable parallel read/write/iterate without affecting performance.
StormDB achieves same performance by creating a new WAL file during compaction. Thus writes are able to clock same performance as earlier.
While compaction is in progress, StormDB iterates through new WAL first. Then it iterates old WAL file and data file as usual. Overall, no significant impact on performance as the total size of data iterated is constant.
Any read request thus can be one amongst new WAL file, new data file, old WAL file and old data file(in this order for latest data). For this, it maintains bitset during compaction to know the correct file. Hence, random lookup in file is once.
StormDB supports auto-compaction strategy that can be optionally turned off. When turned on, it triggers an automatic compaction once the WAL file size exceeds 10% of data file size. In order to control frequent auto-compaction for smaller DBs, another threshold of 8 times the write buffer size (8 X ~4MB = ~32 MB) is applied. In essence, StormDB strives to have not more than 10% of data in WAL.