A modern C++23 database project with efficient serialization and multi-threading capabilities.
- C++23 Standard: Leverages the latest C++ features and improvements
- High-Performance In-Memory Database: Complete CRUD operations with primary key indexing
- Thread-Safe Operations:
std::shared_mutexwith shared read/exclusive write locking - Custom Thread Pool: High-performance thread pool using standard C++ concurrency primitives
- Type-Safe Value System: std::variant-based value container supporting multiple data types
- Schema Validation: Column constraints and type checking
- JSON Persistence: High-speed JSON serialization with performance optimizations
- Performance Monitoring: Comprehensive latency percentiles (P50/P95/P99) and profiling
- Concurrent Access: Multi-threaded read/write operations with atomic counters
- Comprehensive Testing: GoogleTest framework with 30+ unit tests including performance benchmarks
- CMake Build System: Modern CMake setup with automatic dependency management
ScalerDB delivers exceptional performance with comprehensive benchmarking and profiling:
- Insert Speed: 2.37M ops/sec (100 records) to 22K ops/sec (50K records)
- Read Latency: ~41μs average with P99 < 50μs
- Write Latency: ~62μs average with P99 < 800μs
- Concurrent Throughput: 20K+ ops/sec across 8 threads
| Dataset Size | Save Rate | Load Rate | P50 Latency | P99 Latency |
|---|---|---|---|---|
| 100 records | 180 MB/s | 145 MB/s | 0.57ms | 0.66ms |
| 1K records | 248 MB/s | 158 MB/s | 4.2ms | 4.4ms |
| 10K records | 309 MB/s | 74 MB/s | 33.7ms | 34.0ms |
| 50K records | 304 MB/s | 20 MB/s | 171ms | 172ms |
- Concurrent Reads: 15M+ ops/sec (excellent cache locality)
- Random Access: 21M+ ops/sec (hash map efficiency)
- Thread-Safe: Zero race conditions with
std::shared_mutex - Lock Contention: Minimal overhead with shared read locks
- Function-Level Timing: Microsecond precision profiling
- Hot Spot Detection: Automatic identification of performance bottlenecks
- Memory Allocation Tracking: Built-in memory usage monitoring
- Latency Percentiles: P50/P95/P99/P99.9 measurements
ScalerDB implements a layered architecture:
- Value: Type-safe container using
std::variantfor database values - Column: Schema definition with metadata and validation constraints
- Row: Collection of values with efficient access by index and name
- Table: In-memory storage with CRUD operations and primary key indexing
- Database: Top-level container managing multiple tables
scalerdb::Value- Supports null, bool, int32, int64, double, and string typesscalerdb::Column- Column metadata with validation constraintsscalerdb::Row- Database row with schema-aware value accessscalerdb::Table- Table with primary key indexing and CRUD operationsscalerdb::Database- Database container with table management
// Create database and table
Database db("my_db");
auto* table = db.createTable("users", schema, "id");
// INSERT
table->insertRow({Value(1), Value("Alice"), Value(28)});
// SELECT
const Row* row = table->findRowByPK(Value(1));
// UPDATE
table->updateRow(Value(1), {Value(1), Value("Alice Smith"), Value(29)});
// DELETE
table->deleteRow(Value(1));// Save entire database to file
if (db.save("mydb.msgpack")) {
std::cout << "Database saved successfully!" << std::endl;
}
// Load database from file
Database loaded_db;
if (loaded_db.load("mydb.msgpack")) {
std::cout << "Database loaded successfully!" << std::endl;
// All tables, schemas, and data are restored
}All dependencies are automatically fetched and managed by CMake:
- GoogleTest - Testing framework for comprehensive test coverage
- nlohmann/json - High-performance JSON serialization library
- Standard C++ Threading: Custom thread pool using
std::thread,std::mutex,std::condition_variable
Performance Note: ScalerDB uses a custom thread pool implementation built with standard C++ concurrency primitives. This provides optimal performance while eliminating external dependencies and ensuring compatibility across all C++17+ environments.
- CMake 3.20 or higher
- C++23 compatible compiler (GCC 11+, Clang 13+, or MSVC 2022+)
- Git
-
Initialize the environment:
./setup.sh
This script will:
- Check for required tools (CMake, C++23 compiler)
- Create the build directory
- Initialize git repository with appropriate .gitignore
-
Build the project:
./make.sh
-
Build and run tests:
./make.sh test -
Build in debug mode:
./make.sh debug
-
Clean build:
./make.sh clean
-
Main executable:
./build/scalerdb
-
Persistence demo:
./build/demo_persistence
-
Run tests:
./build/test_setup # Basic setup tests ./build/test_core_data_model # Core database tests ./build/test_serialization # Serialization tests ./build/test_threading # Thread-safety tests ./build/test_performance # Performance benchmarks # or run all tests with: cd build && ctest
-
Performance benchmarks:
# Run specific performance tests ./build/test_performance --gtest_filter="*PersistencePerformance*" ./build/test_performance --gtest_filter="*ConcurrentOperationLatencies*" ./build/test_performance --gtest_filter="*CacheBehaviorTest*"
The make.sh script supports several options:
debug- Build in Debug mode with debugging symbolsrelease- Build in Release mode (default, optimized)test- Run tests after buildingclean- Clean the build directory before buildinghelp- Show help message
Examples:
./make.sh debug test # Debug build + run tests
./make.sh clean release # Clean build + release modescalerdb/
├── src/core/ # Core database engine
│ ├── value.hpp # Type-safe value container
│ ├── column.hpp # Column schema and validation
│ ├── row.hpp # Database row implementation
│ ├── table.hpp # Thread-safe table with CRUD operations
│ ├── database.hpp # Database management with statistics
│ ├── database.cpp # Database save/load implementation
│ ├── threadpool.hpp # Custom high-performance thread pool
│ ├── fast_json_loader.hpp # High-speed JSON parsing (future)
│ ├── parallel_persistence.hpp # Parallel I/O operations (future)
│ └── msgpack_types.h # Serialization structures
├── tests/ # Test suites
│ ├── test_core_data_model.cpp # Comprehensive CRUD tests
│ ├── test_serialization.cpp # Serialization round-trip tests
│ ├── test_threading.cpp # Thread-safety and concurrency tests
│ └── test_performance.cpp # Performance benchmarks and profiling
├── demo_persistence.cpp # Persistence demonstration
├── CMakeLists.txt # CMake configuration
├── main.cpp # Example application demonstrating core features
├── test_setup.cpp # Basic setup verification tests
├── setup.sh # Environment setup script
├── make.sh # Build script
├── README.md # This file
├── .gitignore # Git ignore rules
└── build/ # Build artifacts (auto-generated)
├── scalerdb # Main executable
├── demo_persistence # Persistence demo
├── test_setup # Setup tests
├── test_core_data_model # Core database tests
├── test_serialization # Serialization tests
├── test_threading # Thread-safety tests
└── test_performance # Performance benchmarks
- Add source files to the project
- Update
CMakeLists.txtif needed - Add corresponding tests in the test files
- Build and test:
./make.sh test
To add new dependencies, modify the FetchContent_Declare sections in CMakeLists.txt. The build system will automatically download and configure them.
✅ Phase 1 Complete - Core In-Memory Data Model:
- Value System: Type-safe container using
std::variantsupporting null, bool, int32, int64, double, string - Schema Definition: Column class with metadata, constraints, and validation
- Row Management: Efficient data access by index and column name
- Table Operations: Complete CRUD with primary key indexing and unique constraints
- Database Management: Multi-table container with statistics and querying
- Test Coverage: 18 comprehensive unit tests covering all CRUD operations
- Performance: O(1) primary key lookups, O(n) sequential scans
✅ Phase 2 Complete - Persistence & Serialization:
- JSON Persistence: High-performance JSON serialization with round-trip fidelity
- Complete Data Type Support: All Value types (null, bool, int32, int64, double, string)
- Schema Preservation: Column metadata, constraints, defaults, and primary keys maintained
- Error Handling: Robust error handling for file I/O and data corruption
- Performance Testing: Tested with large datasets (50K+ rows, 53MB files)
- Test Coverage: 6 comprehensive serialization tests covering all scenarios
✅ Phase 3 Complete - Threading & Concurrency:
- Thread-Safe Operations:
std::shared_mutexwith shared read/exclusive write locking - Custom Thread Pool: High-performance implementation using standard C++ concurrency
- Concurrent CRUD: Multi-threaded insertRow() and findRowByPK() operations
- Zero Race Conditions: Comprehensive testing with atomic counters and deadlock detection
- Performance Monitoring: Real-time latency percentiles (P50/P95/P99) measurement
- Profiling System: Function-level timing and hot spot detection
- Test Coverage: 4 comprehensive threading tests with stress testing up to 8 threads
✅ Phase 4 Complete - Performance Optimization:
- Benchmarking Suite: Comprehensive performance testing across multiple data sizes
- Memory Pre-allocation: Table capacity hints for bulk operations
- Cache Optimization: Sequential vs random access pattern analysis
- Latency Analysis: Microsecond-precision timing with statistical analysis
- Throughput Measurement: Multi-threaded operation rates up to 2.37M ops/sec
Phase 5 - Advanced Persistence:
- Binary format support (MessagePack/Protocol Buffers)
- Memory-mapped file I/O for large datasets
- Incremental persistence (save only changed data)
- Compression algorithms for storage efficiency
- Parallel I/O operations using thread pool
Phase 6 - Query Engine:
- SQL-like query language parser
- Query optimizer and execution engine
- Secondary indexes (B-tree, hash indexes)
- Advanced filtering and joins
- Query result caching
Phase 7 - Network & Distribution:
- Network interface (TCP/UDP protocols)
- WAL (Write-Ahead Logging) for durability
- Clustering and replication
- Distributed query processing
- Client-server architecture
-
CMake version too old:
- Ensure CMake 3.20+ is installed
- On macOS:
brew install cmake - On Ubuntu:
sudo apt install cmake
-
C++23 not supported:
- Update your compiler to a recent version
- GCC 11+, Clang 13+, or MSVC 2022+
-
Clean build if issues persist:
./make.sh clean
If tests fail, run them individually:
./build/test_setup --gtest_verbose # Basic setup tests
./build/test_core_data_model --gtest_verbose # Core database tests
./build/test_serialization --gtest_verbose # Serialization tests
./build/test_threading --gtest_verbose # Thread-safety tests
./build/test_performance --gtest_verbose # Performance benchmarksRun specific performance tests to analyze bottlenecks:
# Test persistence speeds across dataset sizes
./build/test_performance --gtest_filter="*PersistencePerformance*"
# Test concurrent operations with multiple threads
./build/test_performance --gtest_filter="*ConcurrentOperationLatencies*"
# Test cache behavior patterns
./build/test_performance --gtest_filter="*CacheBehaviorTest*"Performance results include:
- Latency percentiles (P50, P95, P99, P99.9)
- Throughput measurements (ops/sec, MB/s)
- Function-level profiling with microsecond precision
- Memory allocation patterns and hot spot detection
ScalerDB provides high-performance JSON persistence with excellent scalability:
- Full Round-Trip Fidelity: All data types, schemas, and constraints preserved
- Multiple Tables: Save and load entire databases with multiple tables
- Large-Scale Performance: Handles 50K+ rows (53MB files) with 300+ MB/s write speeds
- Error Recovery: Graceful handling of corrupted files and I/O errors
- Debug-Friendly: Human-readable JSON format for easy inspection
- Performance Monitoring: Built-in latency tracking and throughput measurement
Technical Implementation: Uses nlohmann/json with optimized serialization structures. The implementation achieves excellent performance through memory pre-allocation, efficient data structures, and minimal copying. Future optimization plans include binary formats and parallel I/O for even higher throughput.
Example Saved Database File:
{
"tables": [
{
"name": "users",
"primary_key_column": "id",
"columns": [
{"name": "id", "type_index": 2, "nullable": false, "unique": true},
{"name": "name", "type_index": 5, "nullable": false, "unique": false}
],
"rows": [
{"values": [{"type_index": 2, "numeric_data": 1}, {"type_index": 5, "string_data": "Alice"}]}
]
}
]
}This project is open source. See the LICENSE file for details.