benchtop

Benchtop is a framework for storing large JSON documents as JSON blobs directly to disk with indexing provided by the key value database PebbleDb.

Command line

Build:

make

Table Entries

Benchtop KV Store Key Structure

This document outlines the binary key structure used by the benchtop package for storing and indexing data in a key-value (KV) store like PebbleDB. The structure is designed for efficient lookups, scans, and indexing of tabular or graph-like data by leveraging key prefixes and a consistent binary layout. Core Concepts

Key Prefixes

All keys begin with a single-byte prefix to denote the type of data they represent. This allows different types of data to coexist in the same keyspace and enables efficient prefix scans (e.g., "find all position keys").

T (TablePrefix): Keys related to table metadata.

P (PosPrefix): Keys that map a row ID to its physical location.

F (FieldPrefix): Keys that form a secondary index on specific field values.

R (RFieldPrefix): Keys that form a reverse index for efficient index deletion.

Field Separator

A special byte separator, FieldSep (ASCII 0x1F - Unit Separator), is used as a delimiter within compound keys (like the field indexes). This character is chosen because it is a non-printable control character that is not expected to appear in standard string data, ensuring reliable splitting of key components. Key Types

Table Keys

Purpose: To store metadata or identifiers for data tables.

Structure: T | TableId

T: The literal character 'T' (TablePrefix).

TableId: The unique byte slice identifier for the table.

Functions:

NewTableKey(id []byte): Creates a new table key.

ParseTableKey(key []byte): Extracts the TableId from a table key.

Position (Row Location) Keys

Purpose: These keys are the primary index, mapping a unique row/vertex ID to its physical location (offset and size) in a data file.

Structure: P | TableId | RowId

P: The literal character 'P' (PosPrefix).

TableId: A 2-byte uint16 (little-endian) identifying the table the row belongs to.

RowId: The unique byte slice identifier for the row/vertex.

Associated Value: The value stored for this key is an encoded RowLoc struct (see below).

Functions:

NewPosKey(table uint16, name []byte): Creates a new position key.

ParsePosKey(key []byte): Extracts the TableId and RowId from a key.

NewPosKeyPrefix(table uint16): Creates a key prefix for scanning all rows within a specific table.

Field Index Keys

Purpose: To create a secondary index on specific field values. This allows for fast lookups of all rows that have a certain value for a given field (e.g., find all users where city == 'New York').

Structure: FFieldLabelValueRowId

F: The literal character 'F' (FieldPrefix).

<sep>: The FieldSep byte.

Field: The name of the indexed field (e.g., "city").

Label: The label or type of the row (e.g., "user").

Value: The JSON-encoded value of the field (e.g., "New York").

RowId: The unique ID of the row that contains this field value.

Functions:

FieldKey(field, label string, value any, rowID []byte): Creates a full field index key.

FieldKeyParse(key []byte): Parses a field key back into its components.

FieldLabelKey(field, label string): Creates a key prefix for scanning all indexed values for a specific field and label.

Reverse Field Index Keys

Purpose: To enable the efficient deletion of a row's entries from the field indexes. When a row is deleted, this reverse index is used to quickly find all the Field Index Keys that point to it, without having to scan the entire index.

Structure: RLabelFieldRowId
```
R: The literal character 'R' (RFieldPrefix).

<sep>: The FieldSep byte.

Label: The label of the row.

Field: The name of the indexed field.

RowId: The unique ID of the row.
```
Functions:
```
RFieldKey(label, field, rowID string): Creates a new reverse field key.
```

Value Structures RowLoc

Purpose: Represents the physical location of a data record, acting as a "pointer" to the full data object stored elsewhere. It is the value component for a Position Key.

Structure: A fixed 10-byte binary layout.

    Section (Bytes 0-1): A uint16 identifying the file or section where the data is stored.

    Offset (Bytes 2-5): A uint32 representing the starting byte offset within the section.

    Size (Bytes 6-9): A uint32 representing the length of the data in bytes.

Functions:

    EncodeRowLoc(loc *RowLoc): Encodes a RowLoc struct into a 10-byte slice.

    DecodeRowLoc(v []byte): Decodes a 10-byte slice back into a RowLoc struct.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github/workflows		.github/workflows
cmdline/benchtop		cmdline/benchtop
filters		filters
jsontable		jsontable
pebblebulk		pebblebulk
test		test
util		util
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
interface.go		interface.go
keys.go		keys.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

benchtop

Command line

Table Entries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

bmeg/benchtop

Folders and files

Latest commit

History

Repository files navigation

benchtop

Command line

Table Entries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages