diff --git a/eloq_data_store_service/README.md b/eloq_data_store_service/README.md index 6c7b616..fafbce1 100644 --- a/eloq_data_store_service/README.md +++ b/eloq_data_store_service/README.md @@ -131,7 +131,7 @@ To start the DataStore Service: ./dss_server --config=/path/to/s3_config.ini ``` - Where s3_config.ini contains: + Where s3_config.ini contains (legacy format): ```ini [store] rocksdb_cloud_bucket_name = my-eloqdata-bucket @@ -142,6 +142,26 @@ To start the DataStore Service: aws_secret_key = YOUR_SECRET_KEY ``` + **New URL-based configuration (Recommended):** + ```ini + [store] + rocksdb_cloud_s3_url = s3://my-eloqdata-bucket/rocksdb_cloud + rocksdb_cloud_region = us-west-2 + rocksdb_cloud_sst_file_cache_size = 40GB + aws_access_key_id = YOUR_ACCESS_KEY + aws_secret_key = YOUR_SECRET_KEY + ``` + + **For MinIO or S3-compatible storage:** + ```ini + [store] + rocksdb_cloud_s3_url = http://localhost:9000/my-bucket/rocksdb_data + rocksdb_cloud_region = us-east-1 + rocksdb_cloud_sst_file_cache_size = 40GB + aws_access_key_id = minioadmin + aws_secret_key = minioadmin + ``` + 8. **Run with logging to stderr**: ```bash ./dss_server --alsologtostderr --ip=192.168.1.100 --port=9200 @@ -176,6 +196,108 @@ For a production deployment with multiple nodes, you would typically: ./dss_server --eloq_dss_peer_node=192.168.1.100:9100 --ip=192.168.1.101 --port=9100 --data_path=/data/eloqdata/node2 --config=/etc/eloqdata/dss_prod.ini ``` +## RocksDB Cloud Configuration Migration Guide + +### Overview + +Starting from this version, we've introduced a simplified URL-based configuration for RocksDB Cloud (S3/GCS) to reduce configuration complexity. The new `rocksdb_cloud_s3_url` option consolidates multiple configuration parameters into a single URL. + +### Why Migrate? + +The legacy configuration required multiple separate parameters: +- `rocksdb_cloud_bucket_name` +- `rocksdb_cloud_bucket_prefix` +- `rocksdb_cloud_object_path` +- `rocksdb_cloud_s3_endpoint_url` + +The new URL-based configuration simplifies this to a single parameter that's easier to understand and manage. + +### Migration Examples + +#### Example 1: Standard AWS S3 Configuration + +**Legacy Configuration:** +```ini +[store] +rocksdb_cloud_bucket_name = my-production-bucket +rocksdb_cloud_bucket_prefix = eloqkv- +rocksdb_cloud_object_path = rocksdb_data +``` + +**New URL-based Configuration:** +```ini +[store] +rocksdb_cloud_s3_url = s3://my-production-bucket/rocksdb_data +``` + +**Note:** The `bucket_prefix` is not supported in URL-based configuration. If you were using `eloqkv-my-production-bucket`, you should include the prefix in the bucket name: `s3://eloqkv-my-production-bucket/rocksdb_data` + +#### Example 2: MinIO or S3-compatible Storage + +**Legacy Configuration:** +```ini +[store] +rocksdb_cloud_bucket_name = test-bucket +rocksdb_cloud_object_path = eloqdata +rocksdb_cloud_s3_endpoint_url = http://localhost:9000 +``` + +**New URL-based Configuration:** +```ini +[store] +rocksdb_cloud_s3_url = http://localhost:9000/test-bucket/eloqdata +``` + +#### Example 3: HTTPS S3-compatible Storage + +**Legacy Configuration:** +```ini +[store] +rocksdb_cloud_bucket_name = my-bucket +rocksdb_cloud_object_path = data/rocksdb +rocksdb_cloud_s3_endpoint_url = https://s3.custom-provider.com +``` + +**New URL-based Configuration:** +```ini +[store] +rocksdb_cloud_s3_url = https://s3.custom-provider.com/my-bucket/data/rocksdb +``` + +### TxLog (Log Service) Configuration + +The same migration applies to transaction log service configuration, with `txlog_` prefix: + +**Legacy:** +```ini +[local] +txlog_rocksdb_cloud_bucket_name = txlog-bucket +txlog_rocksdb_cloud_bucket_prefix = txlog- +txlog_rocksdb_cloud_object_path = logs +txlog_rocksdb_cloud_endpoint_url = +``` + +**New:** +```ini +[local] +txlog_rocksdb_cloud_s3_url = s3://txlog-bucket/logs +``` + +### Important Notes + +1. **Precedence:** If both the new URL-based configuration and legacy configuration are provided, the URL-based configuration takes precedence and overrides the legacy settings. + +2. **Bucket Prefix Deprecation:** The `bucket_prefix` parameter is not supported in URL-based configuration. If you need a prefix, include it in the bucket name within the URL. + +3. **Backward Compatibility:** The legacy configuration options are still supported and will continue to work. However, we recommend migrating to the URL-based configuration for better maintainability. + +4. **Protocol Support:** The following protocols are supported: + - `s3://` - AWS S3 (default endpoint) + - `http://` - Custom S3-compatible endpoint (HTTP) + - `https://` - Custom S3-compatible endpoint (HTTPS) + +5. **Validation:** Invalid URLs will cause the application to fail at startup with a descriptive error message. + ## Storage Backend Configuration The DataStore Service is compiled with support for specific backend storage technologies. The build defines determine which backend is used. Additional backend-specific configuration can be set in the configuration file. diff --git a/eloq_data_store_service/RocksDB_Configuration_Flags.md b/eloq_data_store_service/RocksDB_Configuration_Flags.md index c311ef8..1568ee1 100644 --- a/eloq_data_store_service/RocksDB_Configuration_Flags.md +++ b/eloq_data_store_service/RocksDB_Configuration_Flags.md @@ -87,13 +87,39 @@ These flags are only applicable when RocksDB Cloud is enabled with either S3 or ### Cloud Storage Configuration +**New URL-based Configuration (Recommended):** + +| Flag Name | Required | Default Value | Format | Description | +|-----------|----------|--------------|--------|-------------| +| `rocksdb_cloud_s3_url` | No | `""` | S3 URL | Complete S3 URL. Takes precedence over legacy config if both provided. | + +**URL Format Examples:** +- AWS S3: `s3://my-bucket/my-object-path` +- MinIO (HTTP): `http://localhost:9000/my-bucket/my-object-path` +- S3-compatible (HTTPS): `https://s3.amazonaws.com/my-bucket/my-object-path` + +**URL Format Specification:** +- S3: `s3://{bucket_name}/{object_path}` +- HTTP/HTTPS: `http(s)://{host}:{port}/{bucket_name}/{object_path}` + +**Supported Protocols:** `s3://`, `http://`, `https://` + +**Legacy Configuration (Deprecated, use URL-based config instead):** + +| Flag Name | Required | Default Value | Format | Description | +|-----------|----------|--------------|--------|-------------| +| `rocksdb_cloud_bucket_name` | No | `"rocksdb-cloud-test"` | String | Cloud storage bucket name (deprecated) | +| `rocksdb_cloud_bucket_prefix` | No | `"eloqkv-"` | String | Prefix for objects in the bucket (deprecated, not supported in URL config) | +| `rocksdb_cloud_object_path` | No | `"rocksdb_cloud"` | String | Path within the bucket (deprecated) | +| `rocksdb_cloud_s3_endpoint_url` | No | `""` | String | S3-compatible object store endpoint URL (deprecated) | + +**Note:** If both `rocksdb_cloud_s3_url` and legacy options are provided, the URL-based configuration takes precedence and overrides the legacy settings. + +**Other Cloud Configuration:** + | Flag Name | Required | Default Value | Format | Description | |-----------|----------|--------------|--------|-------------| -| `rocksdb_cloud_bucket_name` | No | `"rocksdb-cloud-test"` | String | Cloud storage bucket name | -| `rocksdb_cloud_bucket_prefix` | No | `"eloqkv-"` | String | Prefix for objects in the bucket | -| `rocksdb_cloud_object_path` | No | `"rocksdb_cloud"` | String | Path within the bucket | | `rocksdb_cloud_region` | No | `"ap-northeast-1"` | String | Cloud region | -| `rocksdb_cloud_s3_endpoint_url` | No | `""` | String | S3-compatible object store endpoint URL (for development) | ### Cloud Cache Configuration diff --git a/eloq_data_store_service/rocksdb_cloud_data_store.cpp b/eloq_data_store_service/rocksdb_cloud_data_store.cpp index 637ef14..b073c5b 100644 --- a/eloq_data_store_service/rocksdb_cloud_data_store.cpp +++ b/eloq_data_store_service/rocksdb_cloud_data_store.cpp @@ -239,13 +239,13 @@ rocksdb::S3ClientFactory RocksDBCloudDataStore::BuildS3ClientFactory( if (secured_url) { config.scheme = Aws::Http::Scheme::HTTPS; + // Disable SSL verification for test env if necessary + // config.verifySSL = false; } else { config.scheme = Aws::Http::Scheme::HTTP; } - // Disable SSL verification for HTTPS - config.verifySSL = false; // Create and return the S3 client if (credentialsProvider) @@ -296,14 +296,51 @@ bool RocksDBCloudDataStore::StartDB() } #endif - cfs_options_.src_bucket.SetBucketName(cloud_config_.bucket_name_, - cloud_config_.bucket_prefix_); + // Determine effective bucket configuration + // S3 URL takes precedence over legacy configuration + std::string effective_bucket_name = cloud_config_.bucket_name_; + std::string effective_bucket_prefix = cloud_config_.bucket_prefix_; + std::string effective_object_path = cloud_config_.object_path_; + std::string effective_endpoint_url = cloud_config_.s3_endpoint_url_; + + if (!cloud_config_.s3_url_.empty()) + { + // Parse S3 URL and use it (overrides legacy config) + S3UrlComponents url_components = ParseS3Url(cloud_config_.s3_url_); + if (!url_components.is_valid) + { + LOG(FATAL) << "Invalid rocksdb_cloud_s3_url: " + << url_components.error_message + << ". URL format: s3://{bucket}/{path} or " + "http(s)://{host}:{port}/{bucket}/{path}. " + << "Examples: s3://my-bucket/my-path, " + << "http://localhost:9000/my-bucket/my-path"; + } + + effective_bucket_name = url_components.bucket_name; + effective_bucket_prefix = ""; // No prefix in URL-based config + effective_object_path = url_components.object_path; + effective_endpoint_url = url_components.endpoint_url; + + LOG(INFO) << "Using S3 URL configuration (overrides legacy config if " + "present): " + << cloud_config_.s3_url_ + << " (bucket: " << effective_bucket_name + << ", object_path: " << effective_object_path + << ", endpoint: " + << (effective_endpoint_url.empty() ? "default" + : effective_endpoint_url) + << ")"; + } + + cfs_options_.src_bucket.SetBucketName(effective_bucket_name); + cfs_options_.src_bucket.SetBucketPrefix(effective_bucket_prefix); cfs_options_.src_bucket.SetRegion(cloud_config_.region_); - cfs_options_.src_bucket.SetObjectPath(cloud_config_.object_path_); - cfs_options_.dest_bucket.SetBucketName(cloud_config_.bucket_name_, - cloud_config_.bucket_prefix_); + cfs_options_.src_bucket.SetObjectPath(effective_object_path); + cfs_options_.dest_bucket.SetBucketName(effective_bucket_name); + cfs_options_.dest_bucket.SetBucketPrefix(effective_bucket_prefix); cfs_options_.dest_bucket.SetRegion(cloud_config_.region_); - cfs_options_.dest_bucket.SetObjectPath(cloud_config_.object_path_); + cfs_options_.dest_bucket.SetObjectPath(effective_object_path); // Add sst_file_cache for accerlating random access on sst files // use 2^5 = 32 shards for the cache, each shard has sst_file_cache_size_/32 // bytes capacity @@ -338,10 +375,10 @@ bool RocksDBCloudDataStore::StartDB() << cfs_options_.purger_periodicity_millis << " ms" << ", run_purger: " << cfs_options_.run_purger; - if (!cloud_config_.s3_endpoint_url_.empty()) + if (!effective_endpoint_url.empty()) { cfs_options_.s3_client_factory = - BuildS3ClientFactory(cloud_config_.s3_endpoint_url_); + BuildS3ClientFactory(effective_endpoint_url); // Intermittent and unpredictable IOError happend from time to // time when using aws transfer manager with minio. Disable aws // transfer manager if endpoint is set (minio). @@ -447,8 +484,8 @@ bool RocksDBCloudDataStore::OpenCloudDB( // boost write performance by enabling unordered write options.unordered_write = true; // skip Consistency check, which compares the actual file size with the size - // recorded in the metadata, which can fail when skip_cloud_files_in_getchildren is - // set to true + // recorded in the metadata, which can fail when + // skip_cloud_files_in_getchildren is set to true options.paranoid_checks = false; // print db statistics every 60 seconds diff --git a/eloq_data_store_service/rocksdb_cloud_dump.cpp b/eloq_data_store_service/rocksdb_cloud_dump.cpp index 058e78c..1044f11 100644 --- a/eloq_data_store_service/rocksdb_cloud_dump.cpp +++ b/eloq_data_store_service/rocksdb_cloud_dump.cpp @@ -39,16 +39,23 @@ #include #include +#include "rocksdb_config.h" // For S3UrlComponents and ParseS3Url + // Define command line flags DEFINE_string(aws_access_key_id, "", "AWS access key ID"); DEFINE_string(aws_secret_key, "", "AWS secret access key"); -DEFINE_string(bucket_name, "", "S3 bucket name"); -DEFINE_string(bucket_prefix, "", "S3 bucket prefix"); +DEFINE_string(s3_url, + "", + "S3 URL. Format: s3://{bucket}/{path} or " + "http(s)://{host}:{port}/{bucket}/{path}. " + "Takes precedence over legacy config if provided"); +DEFINE_string(bucket_name, "", "S3 bucket name (legacy, use s3_url instead)"); +DEFINE_string(bucket_prefix, "", "S3 bucket prefix (legacy, not supported with s3_url)"); DEFINE_string(object_path, "rocksdb_cloud", - "S3 object path for RocksDB Cloud storage"); + "S3 object path for RocksDB Cloud storage (legacy, use s3_url instead)"); DEFINE_string(region, "us-east-1", "AWS region"); -DEFINE_string(s3_endpoint, "", "Custom S3 endpoint URL (optional)"); +DEFINE_string(s3_endpoint, "", "Custom S3 endpoint URL (optional, legacy, use s3_url instead)"); DEFINE_string(db_path, "./db", "Local DB path"); DEFINE_bool(list_cf, false, "List all column families"); DEFINE_bool(opendb, false, "Open the DB only"); @@ -114,11 +121,39 @@ CmdLineParams parse_arguments() params.aws_access_key_id = FLAGS_aws_access_key_id; params.aws_secret_key = FLAGS_aws_secret_key; - params.bucket_name = FLAGS_bucket_name; - params.bucket_prefix = FLAGS_bucket_prefix; - params.object_path = FLAGS_object_path; + + // Check if s3_url was provided (takes precedence over legacy config) + if (!FLAGS_s3_url.empty()) + { + EloqDS::S3UrlComponents url_components = EloqDS::ParseS3Url(FLAGS_s3_url); + if (!url_components.is_valid) + { + throw std::runtime_error("Invalid s3_url: " + url_components.error_message + + ". URL format: s3://{bucket}/{path} or " + "http(s)://{host}:{port}/{bucket}/{path}"); + } + + params.bucket_name = url_components.bucket_name; + params.bucket_prefix = ""; // No prefix in URL-based config + params.object_path = url_components.object_path; + params.s3_endpoint_url = url_components.endpoint_url; + + LOG(INFO) << "Using S3 URL configuration: " << FLAGS_s3_url + << " (bucket: " << params.bucket_name + << ", object_path: " << params.object_path + << ", endpoint: " << (params.s3_endpoint_url.empty() ? "default" : params.s3_endpoint_url) + << ")"; + } + else + { + // Use legacy configuration + params.bucket_name = FLAGS_bucket_name; + params.bucket_prefix = FLAGS_bucket_prefix; + params.object_path = FLAGS_object_path; + params.s3_endpoint_url = FLAGS_s3_endpoint; + } + params.region = FLAGS_region; - params.s3_endpoint_url = FLAGS_s3_endpoint; params.db_path = FLAGS_db_path; params.list_cf = FLAGS_list_cf; params.opendb = FLAGS_opendb; diff --git a/eloq_data_store_service/rocksdb_config.cpp b/eloq_data_store_service/rocksdb_config.cpp index 22d936e..cb297d6 100644 --- a/eloq_data_store_service/rocksdb_config.cpp +++ b/eloq_data_store_service/rocksdb_config.cpp @@ -272,9 +272,7 @@ DEFINE_uint32(rocksdb_cloud_db_file_deletion_delay_sec, DEFINE_uint32(rocksdb_cloud_warm_up_thread_num, 1, "Rocksdb cloud warm up thread number"); -DEFINE_bool(rocksdb_cloud_run_purger, - true, - "Rocksdb cloud run purger"); +DEFINE_bool(rocksdb_cloud_run_purger, true, "Rocksdb cloud run purger"); DEFINE_uint32(rocksdb_cloud_purger_periodicity_secs, 10 * 60, /*10 minutes*/ "Rocksdb cloud purger periodicity seconds"); @@ -295,6 +293,15 @@ DEFINE_string(rocksdb_cloud_s3_endpoint_url, "S3 compatible object store (e.g. minio) endpoint URL only for " "development purpose"); +DEFINE_string(rocksdb_cloud_s3_url, + "", + "RocksDB cloud S3 URL. Format: s3://{bucket}/{path} or " + "http(s)://{host}:{port}/{bucket}/{path}. " + "Examples: s3://my-bucket/my-path, " + "http://localhost:9000/my-bucket/my-path. " + "This option takes precedence over legacy configuration options " + "if both are provided"); + namespace EloqDS { bool CheckCommandLineFlagIsDefault(const char *name) @@ -509,6 +516,14 @@ RocksDBCloudConfig::RocksDBCloudConfig(const INIReader &config) #endif + // Get the S3 URL configuration (new style) + s3_url_ = + !CheckCommandLineFlagIsDefault("rocksdb_cloud_s3_url") + ? FLAGS_rocksdb_cloud_s3_url + : config.GetString( + "store", "rocksdb_cloud_s3_url", FLAGS_rocksdb_cloud_s3_url); + + // Get legacy configuration bucket_name_ = !CheckCommandLineFlagIsDefault("rocksdb_cloud_bucket_name") ? FLAGS_rocksdb_cloud_bucket_name : config.GetString("store", @@ -565,8 +580,7 @@ RocksDBCloudConfig::RocksDBCloudConfig(const INIReader &config) "rocksdb_cloud_run_purger", FLAGS_rocksdb_cloud_run_purger); uint64_t rocksdb_cloud_purger_periodicity_secs = - !CheckCommandLineFlagIsDefault( - "rocksdb_cloud_purger_periodicity_secs") + !CheckCommandLineFlagIsDefault("rocksdb_cloud_purger_periodicity_secs") ? FLAGS_rocksdb_cloud_purger_periodicity_secs : config.GetInteger("store", "rocksdb_cloud_purger_periodicity_secs", diff --git a/eloq_data_store_service/rocksdb_config.h b/eloq_data_store_service/rocksdb_config.h index d1d715a..5204e59 100644 --- a/eloq_data_store_service/rocksdb_config.h +++ b/eloq_data_store_service/rocksdb_config.h @@ -71,6 +71,111 @@ struct RocksDBConfig #if (defined(DATA_STORE_TYPE_ELOQDSS_ROCKSDB_CLOUD_S3) || \ defined(DATA_STORE_TYPE_ELOQDSS_ROCKSDB_CLOUD_GCS)) +struct S3UrlComponents +{ + std::string protocol; // "s3", "http", "https" + std::string bucket_name; + std::string object_path; + std::string endpoint_url; // derived from http/https URLs + bool is_valid{false}; + std::string error_message; +}; + +// Parse S3 URL in format: +// s3://{bucket_name}/{object_path} +// http://{host}:{port}/{bucket_name}/{object_path} +// https://{host}:{port}/{bucket_name}/{object_path} +// Examples: +// s3://my-bucket/my-path +// http://localhost:9000/my-bucket/my-path +// https://s3.amazonaws.com/my-bucket/my-path +inline S3UrlComponents ParseS3Url(const std::string &s3_url) +{ + S3UrlComponents result; + + if (s3_url.empty()) + { + result.error_message = "S3 URL is empty"; + return result; + } + + // Find protocol separator + size_t protocol_end = s3_url.find("://"); + if (protocol_end == std::string::npos) + { + result.error_message = "Invalid S3 URL format: missing '://' separator"; + return result; + } + + result.protocol = s3_url.substr(0, protocol_end); + + // Validate protocol + if (result.protocol != "s3" && result.protocol != "http" && + result.protocol != "https") + { + result.error_message = "Invalid protocol '" + result.protocol + + "'. Must be one of: s3, http, https"; + return result; + } + + // Extract the part after protocol + size_t path_start = protocol_end + 3; // Skip "://" + if (path_start >= s3_url.length()) + { + result.error_message = "Invalid S3 URL format: no content after protocol"; + return result; + } + + std::string remaining = s3_url.substr(path_start); + + // For http/https, extract the endpoint (host:port) and then the bucket/path + // Format: http(s)://{host}:{port}/{bucket_name}/{object_path} + if (result.protocol == "http" || result.protocol == "https") + { + // Find the first slash after the host:port + size_t first_slash = remaining.find('/'); + if (first_slash == std::string::npos) + { + result.error_message = + "Invalid S3 URL format: missing bucket and object path"; + return result; + } + + // Store the full endpoint URL including protocol + result.endpoint_url = result.protocol + "://" + + remaining.substr(0, first_slash); + remaining = remaining.substr(first_slash + 1); + } + + // Now extract bucket_name and object_path from remaining + // Format: {bucket_name}/{object_path} + size_t first_slash = remaining.find('/'); + if (first_slash == std::string::npos) + { + result.error_message = + "Invalid S3 URL format: missing object path (format: {bucket_name}/{object_path})"; + return result; + } + + result.bucket_name = remaining.substr(0, first_slash); + result.object_path = remaining.substr(first_slash + 1); + + if (result.bucket_name.empty()) + { + result.error_message = "Bucket name cannot be empty"; + return result; + } + + if (result.object_path.empty()) + { + result.error_message = "Object path cannot be empty"; + return result; + } + + result.is_valid = true; + return result; +} + struct RocksDBCloudConfig { RocksDBCloudConfig() = default; @@ -89,10 +194,14 @@ struct RocksDBCloudConfig uint32_t db_ready_timeout_us_; uint32_t db_file_deletion_delay_; std::string s3_endpoint_url_; + std::string s3_url_; // New URL-based configuration size_t warm_up_thread_num_; bool run_purger_{true}; size_t purger_periodicity_millis_{10 * 60 * 1000}; // 10 minutes std::string branch_name_; + + // Returns true if S3 URL configuration is being used + bool IsS3UrlConfigured() const { return !s3_url_.empty(); } }; inline rocksdb::Status NewCloudFileSystem(