Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
dbbf65d
Basic implementation
Sep 3, 2017
0bb3495
Add dirt simple bench
Sep 4, 2017
e37e27e
Add dirt simple query test
Sep 7, 2017
b0b6645
Move to single public header
Sep 9, 2017
06e6f64
Switch NAME_MAX to PATH_MAX
Sep 9, 2017
cdc1b3a
Improve load generator
Sep 11, 2017
737c99b
Compress store files
Sep 9, 2017
fbb1a1e
Changed stuff
Sep 13, 2017
2bdd24f
Merge pull request #1 from RAttab/store-compression
RAttab Sep 14, 2017
e241731
Fix hourly rotation collision
Sep 15, 2017
bc86a1d
Add acc and rotate
Sep 17, 2017
a9cdc09
Fixed a bunch of stuff
Sep 17, 2017
08ecfc2
Add weekly rotations
Sep 17, 2017
ca14319
Print to stdout
Sep 17, 2017
25c4f47
Add some reserved fields in the store header
Sep 18, 2017
7b65434
Make query functions const
Sep 18, 2017
34356bb
Avoid dirent.d_type in rill_scan_dir
Sep 20, 2017
3e56ab6
Add rill_rotate util
Sep 20, 2017
71b66ed
Add iterator to store
Sep 23, 2017
5b4fabf
Add extra dump utility
Sep 23, 2017
5032779
Add val dump functions to store
Oct 1, 2017
fb40681
Add write stamp in store
Oct 1, 2017
fdf244f
Improve error handling
Oct 1, 2017
2035a7c
Add index to store file
Sep 24, 2017
a6298c8
Add rill_query utils
Oct 2, 2017
a610346
Fix various potential leaks
Oct 9, 2017
63fbb40
Add MAP_POPULATE to acc's mmap
Oct 10, 2017
1313c73
Remove last use of readdir_r
Oct 10, 2017
04504cd
Fix interpolation search
pjhades Oct 10, 2017
e074d33
Merge pull request #2 from pjhades/fix/interpolation-search
RAttab Oct 12, 2017
95ce749
Make database globally readable
Oct 13, 2017
b65abbc
Add more flexible compile options
psyomn Oct 10, 2017
e3be08e
Fix gcc-7 warnings
psyomn Oct 10, 2017
d1eb287
Remove use of readdir_r in tests
psyomn Oct 11, 2017
a75c7c8
Use set intersection to improve lookup speed
psyomn Oct 11, 2017
2c9e85d
Merge pull request #4 from psyomn/perf/sort-keyvals
RAttab Oct 19, 2017
19bf217
Fix memcpy in rill_query_vals
Oct 19, 2017
4636ca1
Add store tests
Oct 22, 2017
328d1e5
Pre-check vals before store decode
Oct 24, 2017
cf14210
Add all function to query
Nov 2, 2017
0dfb73a
Add invert utility
Nov 3, 2017
7a980c5
Switch to boring binary search
Nov 3, 2017
49e2c68
Improve invert utility
Nov 3, 2017
328d8f3
Add missing encoder error messages
Nov 7, 2017
71d69e1
Fix cap calculation for store merge
Nov 7, 2017
aad6716
Fix test rm function to no longer abort
Nov 7, 2017
e14846d
Bump expiration to 15 months
Nov 7, 2017
09e7ca6
Fix merge cap calculation (again)
Nov 7, 2017
409d7d4
Dedup filenames when rotating
Nov 8, 2017
e73f69d
Adjust capacity estimate based on number of values
Nov 21, 2017
b248551
Start of a README
Nov 21, 2017
48a5c51
Tweak the readme
Nov 23, 2017
153fbd3
Add an ingest utility
Nov 21, 2017
6b828b1
Improve dump output
Nov 21, 2017
0bc059b
Add merge utility
Nov 23, 2017
9c9d06b
Improve merge
Nov 26, 2017
5ec03fc
Improve the rill_dump util
Nov 27, 2017
0ad07b0
Grab flock on rotation folder
Dec 7, 2017
b7b9821
Split acc out of rotation
Dec 7, 2017
c83031c
Use substring search for rill extension
Dec 11, 2017
584f684
Add size asserts on index write
Dec 11, 2017
4a452a7
Fix coder cap estimates in edge cases
Dec 11, 2017
e84b360
Remove stamp version check
Dec 11, 2017
802707b
Terminate pairs iteration in dump
Jan 8, 2018
5735d41
Add reverse lookup to rill
psyomn Jan 23, 2018
455b15e
Merge pull request #6 from psyomn/feature/the-upside-down
RAttab Apr 18, 2018
bbeba0b
Readd rill query all
psyomn Apr 18, 2018
7a63aab
Merge pull request #7 from psyomn/readd-rill-query-all
RAttab Apr 18, 2018
a06e28e
Remove the indexer
Apr 27, 2018
9c8947d
Merge pull request #8 from RAttab/remove-indexer
RAttab Apr 29, 2018
04d45bd
Add rill_count written by @RAttab
psyomn Apr 13, 2018
7e2d9e5
Merge pull request #9 from psyomn/remis-rill-count
RAttab Jun 14, 2018
3580e4f
Mass rename
Jun 14, 2018
dc20b2b
rework rill_rows interface
Jun 14, 2018
5b15725
fix store
Jun 14, 2018
3c02a23
clean up acc and query
Jul 6, 2018
a7e9429
Fix up coder tests
Jul 6, 2018
6bb5bcf
fix rotate test
Jul 6, 2018
ec57934
fix store test
Jul 6, 2018
47be1e9
remove query test
Jul 6, 2018
f7b1ea4
Move rill_generate to src folder
Jul 6, 2018
519dc8f
rename indexer_test
Jul 6, 2018
f9fd556
Fix compilation errors
Jul 6, 2018
a1757d6
fix rill_dump
Jul 6, 2018
e6bf3c0
fix rill_query
Jul 6, 2018
e1a1de7
Centralized a,b args handling
Jul 6, 2018
c61988f
fix rill_ingest
Jul 6, 2018
f7cb23d
Fix rill_count
Jul 6, 2018
c221f35
Remove generate for now
Jul 6, 2018
0e73800
compile tests
Jul 6, 2018
740f0a9
dumb bugs
Jul 6, 2018
e9104bb
Fix store
Aug 10, 2018
b976d79
Tweak compile script
RAttab Oct 6, 2018
8000b65
Fix store test
RAttab Oct 6, 2018
a1a001a
Fix merge typo
RAttab Oct 6, 2018
cfd68e0
Tweak rill_dump a bit
RAttab Oct 6, 2018
cce2168
Singled out rotate_test in build
RAttab Oct 6, 2018
d31de7f
Add merge test to store tests
RAttab Oct 6, 2018
63148c5
Merge pull request #10 from RAttab/interface-cleanup
RAttab Oct 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
build

# Prerequisites
*.d

Expand Down
126 changes: 126 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Rill

A one-off specialized database for a 2 column schema that focuses on compressed
storage and read-only mmap files.

## Building

```shell
$ mkdir build && cd build
$ PREFIX=.. ../compile.sh
```

Currently only used in the `rill-rs` project which means that no install target
is currently provided. Build artifacts are the following:
- `src/rill.h`
- `build/rill.a`

## Design Space

- A pair is composed of a `u64` key and a `u64` value
- Key cardinality is in the order of 1 million
- Value cardinality is in the order of 100 million
- Infrequent batch query of keys over entire dataset
- Batch queries must finish within 5 minutes
- Pair ingestion must happen in real-time (order of 100k/sec)
- Pairs duplicates are very common
- Expire entire month of data older then 15 months
- Expect around 50 billion unique pairs in a single month
- Servers have around 250Gb of RAM and 2TB of SSD disk space


## Architecture

Rill is split into the following major components:

- `acc`: real-time data ingestion
- `store`: file storage format
- `rotation`: progressive merging and expiration of store files

### Ingestion


### Storage

Basic design philosophy:

- Immutable
- Mutation through merge operation
- All pairs are sorted
- Memory mapped and queried directly.


#### Compression

The main goal for storage is to fit the entire dataset on the disks of a single
server:

50B pairs * 15 months * 16bytes per pair = 12TB of disk space

Given our 2TB of available disk space, we need to do some compression to store
everything. A general sketch of the compression is as follows:

- Don't repeat keys
- Uniformize the namespace of the values
- Block encode (LEB128) the uniformized values

Implemention basically begins by extracting all the unique values in the dataset
sorting them and storing them in a table. Using this table, we can then encode
indexes into our table instead of the values themselves. This means our
compression is dependent on the cardinality of the value set and not so much the
values themselves.

Encoding the pairs is a simple scheme of writting the key in full, followed by a
list of all the value associated with that key. The list of values is a
block-encoding (LEB128) of the indexes into the value table. In other words, the
smaller the cardinality of the set the less byte we'll use on average to write a
value.

Empirically, we were are able compress a single month of data down to less then
100GB which means that our dataset now sits comfortably on our 2TB disks.


#### Index

We must also be able to quickly query a single key and extract all the
associated values for that key. Our compression requirements puts a bound on the
size of our index. A general sketch of the index is as follows:

- Don't repeat keys
- Store the keys along with the offset of their value location in a table
- Search the table via tweaked binary search

Implementation starts by building a table of all the keys and filling in their
offset as we encode the pairs. We also no longer store the keys with the pairs
as we can simply recover the key for a given list of value via it's implicit
index in the file. The index table is stored as is at the end of the file.

Searching is done via a tweaked binary search over the index table. Empirically
this has proven to be fast enough to meet our 5 minutes batch query
requirements. Further optimizations are possible. We've also experimented with a
single pass interpolation search followed by a vectorized linear scan but
changes in the input data meant that the keys were no longer well distributed
which made the approach unusable.


#### Stamp

Safe persistence is accomplished via a pseudo-2-phase commit scheme that uses a
stamp to mark the file as complete. Steps are as follow:

- Write the entire file
- Flush to disk
- Write a magic stamp value in the header
- Flush to disk

This guarantees that if the stamp is found at the beginning of the file then the
file has been completely written and persisted to disk. Note that rill relies on
the underlying file system to detect file corruption as no checksums are
computed or maintained.

Note that after rill files are frequently deleted after being merged so the
stamping mechanism is critical to avoid deleting files that were not properly
merged.


### Rotation
64 changes: 64 additions & 0 deletions compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#! /usr/bin/env bash

set -o errexit -o nounset -o pipefail -o xtrace

: ${PREFIX:="."}

declare -a SRC
SRC=(htable rng utils rows store acc rotate query)

declare -a BIN
BIN=(load dump query rotate ingest merge count)

declare -a TEST
TEST=(index coder store)

CC=${OTHERC:-gcc}
LEAKCHECK_ENABLED=${LEAKCHECK_ENABLED:-}

CFLAGS="-ggdb -O3 -march=native -pipe -std=gnu11 -D_GNU_SOURCE"
CFLAGS="$CFLAGS -I${PREFIX}/src"

CFLAGS="$CFLAGS -Werror -Wall -Wextra"
CFLAGS="$CFLAGS -Wundef"
CFLAGS="$CFLAGS -Wcast-align"
CFLAGS="$CFLAGS -Wwrite-strings"
CFLAGS="$CFLAGS -Wunreachable-code"
CFLAGS="$CFLAGS -Wformat=2"
CFLAGS="$CFLAGS -Wswitch-enum"
CFLAGS="$CFLAGS -Wswitch-default"
CFLAGS="$CFLAGS -Winit-self"
CFLAGS="$CFLAGS -Wno-strict-aliasing"
CFLAGS="$CFLAGS -fno-strict-aliasing"
CFLAGS="$CFLAGS -Wno-implicit-fallthrough"

OBJ=""
for src in "${SRC[@]}"; do
$CC -c -o "$src.o" "${PREFIX}/src/$src.c" $CFLAGS
OBJ="$OBJ $src.o"
done
ar rcs librill.a $OBJ

for bin in "${BIN[@]}"; do
$CC -o "rill_$bin" "${PREFIX}/src/rill_$bin.c" librill.a $CFLAGS
done

for test in "${TEST[@]}"; do
$CC -o "test_$test" "${PREFIX}/test/${test}_test.c" librill.a $CFLAGS
"./test_$test"
done

# this one takes a while so it's usually run manually
$CC -o "test_rotate" "${PREFIX}/test/rotate_test.c" librill.a $CFLAGS


if [ -n "$LEAKCHECK_ENABLED" ]; then
for test in "{TEST[@]}"; do
valgrind \
--leak-check=full \
--track-origins=yes \
--trace-children=yes \
--error-exitcode=1 \
"./test_$test"
done
fi
Loading