Skip to content

Redis-like performance with SQLite-like simplicity and PostgreSQL-like schema validation!

License

Notifications You must be signed in to change notification settings

Nu11ified/scalerdb

Repository files navigation

ScalerDB

A modern C++23 database project with efficient serialization and multi-threading capabilities.

Features

  • C++23 Standard: Leverages the latest C++ features and improvements
  • High-Performance In-Memory Database: Complete CRUD operations with primary key indexing
  • Thread-Safe Operations: std::shared_mutex with shared read/exclusive write locking
  • Custom Thread Pool: High-performance thread pool using standard C++ concurrency primitives
  • Type-Safe Value System: std::variant-based value container supporting multiple data types
  • Schema Validation: Column constraints and type checking
  • JSON Persistence: High-speed JSON serialization with performance optimizations
  • Performance Monitoring: Comprehensive latency percentiles (P50/P95/P99) and profiling
  • Concurrent Access: Multi-threaded read/write operations with atomic counters
  • Comprehensive Testing: GoogleTest framework with 30+ unit tests including performance benchmarks
  • CMake Build System: Modern CMake setup with automatic dependency management

Performance Benchmarks

ScalerDB delivers exceptional performance with comprehensive benchmarking and profiling:

In-Memory Operations

  • Insert Speed: 2.37M ops/sec (100 records) to 22K ops/sec (50K records)
  • Read Latency: ~41μs average with P99 < 50μs
  • Write Latency: ~62μs average with P99 < 800μs
  • Concurrent Throughput: 20K+ ops/sec across 8 threads

Persistence Performance

Dataset Size Save Rate Load Rate P50 Latency P99 Latency
100 records 180 MB/s 145 MB/s 0.57ms 0.66ms
1K records 248 MB/s 158 MB/s 4.2ms 4.4ms
10K records 309 MB/s 74 MB/s 33.7ms 34.0ms
50K records 304 MB/s 20 MB/s 171ms 172ms

Threading Performance

  • Concurrent Reads: 15M+ ops/sec (excellent cache locality)
  • Random Access: 21M+ ops/sec (hash map efficiency)
  • Thread-Safe: Zero race conditions with std::shared_mutex
  • Lock Contention: Minimal overhead with shared read locks

Profiling Capabilities

  • Function-Level Timing: Microsecond precision profiling
  • Hot Spot Detection: Automatic identification of performance bottlenecks
  • Memory Allocation Tracking: Built-in memory usage monitoring
  • Latency Percentiles: P50/P95/P99/P99.9 measurements

Core Data Model

Architecture Overview

ScalerDB implements a layered architecture:

  1. Value: Type-safe container using std::variant for database values
  2. Column: Schema definition with metadata and validation constraints
  3. Row: Collection of values with efficient access by index and name
  4. Table: In-memory storage with CRUD operations and primary key indexing
  5. Database: Top-level container managing multiple tables

Key Classes

  • scalerdb::Value - Supports null, bool, int32, int64, double, and string types
  • scalerdb::Column - Column metadata with validation constraints
  • scalerdb::Row - Database row with schema-aware value access
  • scalerdb::Table - Table with primary key indexing and CRUD operations
  • scalerdb::Database - Database container with table management

CRUD Operations

// Create database and table
Database db("my_db");
auto* table = db.createTable("users", schema, "id");

// INSERT
table->insertRow({Value(1), Value("Alice"), Value(28)});

// SELECT
const Row* row = table->findRowByPK(Value(1));

// UPDATE  
table->updateRow(Value(1), {Value(1), Value("Alice Smith"), Value(29)});

// DELETE
table->deleteRow(Value(1));

Persistence & Serialization

// Save entire database to file
if (db.save("mydb.msgpack")) {
    std::cout << "Database saved successfully!" << std::endl;
}

// Load database from file
Database loaded_db;
if (loaded_db.load("mydb.msgpack")) {
    std::cout << "Database loaded successfully!" << std::endl;
    // All tables, schemas, and data are restored
}

Dependencies

All dependencies are automatically fetched and managed by CMake:

  • GoogleTest - Testing framework for comprehensive test coverage
  • nlohmann/json - High-performance JSON serialization library
  • Standard C++ Threading: Custom thread pool using std::thread, std::mutex, std::condition_variable

Performance Note: ScalerDB uses a custom thread pool implementation built with standard C++ concurrency primitives. This provides optimal performance while eliminating external dependencies and ensuring compatibility across all C++17+ environments.

Quick Start

Prerequisites

  • CMake 3.20 or higher
  • C++23 compatible compiler (GCC 11+, Clang 13+, or MSVC 2022+)
  • Git

Setup

  1. Initialize the environment:

    ./setup.sh

    This script will:

    • Check for required tools (CMake, C++23 compiler)
    • Create the build directory
    • Initialize git repository with appropriate .gitignore
  2. Build the project:

    ./make.sh
  3. Build and run tests:

    ./make.sh test
  4. Build in debug mode:

    ./make.sh debug
  5. Clean build:

    ./make.sh clean

Running

  • Main executable:

    ./build/scalerdb
  • Persistence demo:

    ./build/demo_persistence
  • Run tests:

    ./build/test_setup              # Basic setup tests
    ./build/test_core_data_model    # Core database tests  
    ./build/test_serialization     # Serialization tests
    ./build/test_threading          # Thread-safety tests
    ./build/test_performance        # Performance benchmarks
    # or run all tests with:
    cd build && ctest
  • Performance benchmarks:

    # Run specific performance tests
    ./build/test_performance --gtest_filter="*PersistencePerformance*"
    ./build/test_performance --gtest_filter="*ConcurrentOperationLatencies*"
    ./build/test_performance --gtest_filter="*CacheBehaviorTest*"

Build Options

The make.sh script supports several options:

  • debug - Build in Debug mode with debugging symbols
  • release - Build in Release mode (default, optimized)
  • test - Run tests after building
  • clean - Clean the build directory before building
  • help - Show help message

Examples:

./make.sh debug test    # Debug build + run tests
./make.sh clean release # Clean build + release mode

Project Structure

scalerdb/
├── src/core/               # Core database engine
│   ├── value.hpp           # Type-safe value container
│   ├── column.hpp          # Column schema and validation
│   ├── row.hpp             # Database row implementation
│   ├── table.hpp           # Thread-safe table with CRUD operations
│   ├── database.hpp        # Database management with statistics
│   ├── database.cpp        # Database save/load implementation
│   ├── threadpool.hpp      # Custom high-performance thread pool
│   ├── fast_json_loader.hpp     # High-speed JSON parsing (future)
│   ├── parallel_persistence.hpp # Parallel I/O operations (future)
│   └── msgpack_types.h     # Serialization structures
├── tests/                  # Test suites
│   ├── test_core_data_model.cpp    # Comprehensive CRUD tests
│   ├── test_serialization.cpp     # Serialization round-trip tests
│   ├── test_threading.cpp         # Thread-safety and concurrency tests
│   └── test_performance.cpp       # Performance benchmarks and profiling
├── demo_persistence.cpp    # Persistence demonstration
├── CMakeLists.txt          # CMake configuration
├── main.cpp                # Example application demonstrating core features
├── test_setup.cpp          # Basic setup verification tests
├── setup.sh                # Environment setup script
├── make.sh                 # Build script
├── README.md               # This file
├── .gitignore              # Git ignore rules
└── build/                  # Build artifacts (auto-generated)
    ├── scalerdb                # Main executable
    ├── demo_persistence        # Persistence demo
    ├── test_setup              # Setup tests
    ├── test_core_data_model    # Core database tests
    ├── test_serialization     # Serialization tests
    ├── test_threading          # Thread-safety tests
    └── test_performance        # Performance benchmarks

Development

Adding New Features

  1. Add source files to the project
  2. Update CMakeLists.txt if needed
  3. Add corresponding tests in the test files
  4. Build and test: ./make.sh test

Dependencies

To add new dependencies, modify the FetchContent_Declare sections in CMakeLists.txt. The build system will automatically download and configure them.

Development Status

Phase 1 Complete - Core In-Memory Data Model:

  • Value System: Type-safe container using std::variant supporting null, bool, int32, int64, double, string
  • Schema Definition: Column class with metadata, constraints, and validation
  • Row Management: Efficient data access by index and column name
  • Table Operations: Complete CRUD with primary key indexing and unique constraints
  • Database Management: Multi-table container with statistics and querying
  • Test Coverage: 18 comprehensive unit tests covering all CRUD operations
  • Performance: O(1) primary key lookups, O(n) sequential scans

Phase 2 Complete - Persistence & Serialization:

  • JSON Persistence: High-performance JSON serialization with round-trip fidelity
  • Complete Data Type Support: All Value types (null, bool, int32, int64, double, string)
  • Schema Preservation: Column metadata, constraints, defaults, and primary keys maintained
  • Error Handling: Robust error handling for file I/O and data corruption
  • Performance Testing: Tested with large datasets (50K+ rows, 53MB files)
  • Test Coverage: 6 comprehensive serialization tests covering all scenarios

Phase 3 Complete - Threading & Concurrency:

  • Thread-Safe Operations: std::shared_mutex with shared read/exclusive write locking
  • Custom Thread Pool: High-performance implementation using standard C++ concurrency
  • Concurrent CRUD: Multi-threaded insertRow() and findRowByPK() operations
  • Zero Race Conditions: Comprehensive testing with atomic counters and deadlock detection
  • Performance Monitoring: Real-time latency percentiles (P50/P95/P99) measurement
  • Profiling System: Function-level timing and hot spot detection
  • Test Coverage: 4 comprehensive threading tests with stress testing up to 8 threads

Phase 4 Complete - Performance Optimization:

  • Benchmarking Suite: Comprehensive performance testing across multiple data sizes
  • Memory Pre-allocation: Table capacity hints for bulk operations
  • Cache Optimization: Sequential vs random access pattern analysis
  • Latency Analysis: Microsecond-precision timing with statistical analysis
  • Throughput Measurement: Multi-threaded operation rates up to 2.37M ops/sec

Future Enhancements

Phase 5 - Advanced Persistence:

  • Binary format support (MessagePack/Protocol Buffers)
  • Memory-mapped file I/O for large datasets
  • Incremental persistence (save only changed data)
  • Compression algorithms for storage efficiency
  • Parallel I/O operations using thread pool

Phase 6 - Query Engine:

  • SQL-like query language parser
  • Query optimizer and execution engine
  • Secondary indexes (B-tree, hash indexes)
  • Advanced filtering and joins
  • Query result caching

Phase 7 - Network & Distribution:

  • Network interface (TCP/UDP protocols)
  • WAL (Write-Ahead Logging) for durability
  • Clustering and replication
  • Distributed query processing
  • Client-server architecture

Troubleshooting

Build Issues

  1. CMake version too old:

    • Ensure CMake 3.20+ is installed
    • On macOS: brew install cmake
    • On Ubuntu: sudo apt install cmake
  2. C++23 not supported:

    • Update your compiler to a recent version
    • GCC 11+, Clang 13+, or MSVC 2022+
  3. Clean build if issues persist:

    ./make.sh clean

Testing Issues

If tests fail, run them individually:

./build/test_setup --gtest_verbose          # Basic setup tests
./build/test_core_data_model --gtest_verbose # Core database tests
./build/test_serialization --gtest_verbose  # Serialization tests
./build/test_threading --gtest_verbose      # Thread-safety tests
./build/test_performance --gtest_verbose    # Performance benchmarks

Performance Testing

Run specific performance tests to analyze bottlenecks:

# Test persistence speeds across dataset sizes
./build/test_performance --gtest_filter="*PersistencePerformance*"

# Test concurrent operations with multiple threads  
./build/test_performance --gtest_filter="*ConcurrentOperationLatencies*"

# Test cache behavior patterns
./build/test_performance --gtest_filter="*CacheBehaviorTest*"

Performance results include:

  • Latency percentiles (P50, P95, P99, P99.9)
  • Throughput measurements (ops/sec, MB/s)
  • Function-level profiling with microsecond precision
  • Memory allocation patterns and hot spot detection

Persistence Features

ScalerDB provides high-performance JSON persistence with excellent scalability:

  • Full Round-Trip Fidelity: All data types, schemas, and constraints preserved
  • Multiple Tables: Save and load entire databases with multiple tables
  • Large-Scale Performance: Handles 50K+ rows (53MB files) with 300+ MB/s write speeds
  • Error Recovery: Graceful handling of corrupted files and I/O errors
  • Debug-Friendly: Human-readable JSON format for easy inspection
  • Performance Monitoring: Built-in latency tracking and throughput measurement

Technical Implementation: Uses nlohmann/json with optimized serialization structures. The implementation achieves excellent performance through memory pre-allocation, efficient data structures, and minimal copying. Future optimization plans include binary formats and parallel I/O for even higher throughput.

Example Saved Database File:

{
  "tables": [
    {
      "name": "users",
      "primary_key_column": "id",
      "columns": [
        {"name": "id", "type_index": 2, "nullable": false, "unique": true},
        {"name": "name", "type_index": 5, "nullable": false, "unique": false}
      ],
      "rows": [
        {"values": [{"type_index": 2, "numeric_data": 1}, {"type_index": 5, "string_data": "Alice"}]}
      ]
    }
  ]
}

License

This project is open source. See the LICENSE file for details.

About

Redis-like performance with SQLite-like simplicity and PostgreSQL-like schema validation!

Resources

License

Stars

Watchers

Forks