Conversation
…#73/#74/#77/#78/#74/#75/#76/#79/#80/#81/#82/#83/#84 Correct handlerton compatibility for MariaDB 11.4+/12+. Correct MTR suite for 11.4+ and 12+. Fix concurrent conflict handling (issue #77). Major performance pass eliminating per-row overhead in hot paths. --- Correctness fixes --- Isolation levels now respected: Add resolve_effective_isolation() that maps the MariaDB session's SET TRANSACTION ISOLATION LEVEL (via thd_tx_isolation()) to the corresponding TidesDB isolation level: READ UNCOMMITTED -> TDB_ISOLATION_READ_UNCOMMITTED READ COMMITTED -> TDB_ISOLATION_READ_COMMITTED REPEATABLE READ -> TDB_ISOLATION_REPEATABLE_READ (or TDB_ISOLATION_SNAPSHOT if table option says so) SERIALIZABLE -> TDB_ISOLATION_SERIALIZABLE Previously ensure_stmt_txn() and external_lock() used share->isolation_level (static table option) for multi-statement txns and hard-coded READ_COMMITTED for autocommit. start_consistent_snapshot hard-coded REPEATABLE_READ. The session's tx_isolation was completely ignored. All three paths now call resolve_effective_isolation(thd, share->isolation_level). TidesDB's SNAPSHOT level (no SQL equivalent) is honoured when the session is at REPEATABLE READ and the table option specifies SNAPSHOT. Map MariaDB REPEATABLE_READ -> TidesDB SNAPSHOT, which is the semantic equivalent of InnoDB's repeatable-read (write-write conflict detection only, no read-set tracking). dup_ref population (root cause of 'Can't find record' on IODKU): write_row() now copies the conflicting row's PK bytes into dup_ref before returning HA_ERR_FOUND_DUPP_KEY, for both PK duplicates and unique secondary index duplicates. REPLACE INTO secondary index orphan fix: Remove write_can_replace_ from the skip_unique check entirely. Now only the tidesdb_skip_unique_check session variable skips uniqueness checks. Both REPLACE INTO and INSERT ON DUPLICATE KEY UPDATE let write_row() return HA_ERR_FOUND_DUPP_KEY so the server properly handles delete+reinsert and cleans up old secondary index entries via delete_row(). extra() fix: HA_EXTRA_INSERT_WITH_UPDATE no longer sets write_can_replace_. DDL isolation fix: inplace_alter_table() (ALTER TABLE ADD INDEX) previously used share->isolation_level for scan transactions. If the table was created with SERIALIZABLE, the index build would track millions of read-set entries causing unbounded memory growth. Now hard-coded to TDB_ISOLATION_READ_COMMITTED for both initial and batch-restart transactions. DDL never needs OCC conflict detection. Bulk insert mid-commit isolation: The batch re-begin in write_row() previously used share->isolation_level. Now uses TDB_ISOLATION_READ_COMMITTED since bulk inserts do not need snapshot consistency across batches. Bulk insert iterator invalidation: After bulk insert mid-commit creates a fresh transaction, the old scan_iter held a dangling pointer to the freed txn. Now invalidate the iterator (tidesdb_iter_free + NULL) and update scan_txn to point to the new transaction. Bump txn_generation so other handler objects detect the change. --- Library fixes (src/tidesdb.c) --- SNAPSHOT isolation read-set tracking: tidesdb_txn_add_to_read_set() was only skipping tracking for isolation levels < TDB_ISOLATION_REPEATABLE_READ (i.e. RU and RC). SNAPSHOT (enum value 3) fell through and tracked every read, but SNAPSHOT only needs write-write conflict detection. Fix: skip read tracking when isolation_level == TDB_ISOLATION_SNAPSHOT. Also fix tidesdb_txn_check_read_conflicts() to skip for SNAPSHOT. Previously SNAPSHOT ran full read-conflict and SSI checks at commit, making it behave identically to SERIALIZABLE. --- Performance fixes --- Debug logging gated behind srv_debug_trace: ~25 sql_print_information('[T77]...') calls were firing unconditionally on every INSERT, SELECT, scan, lock, and commit. At high throughput this serialises all threads on the error-log mutex. All calls now gated with if (unlikely(srv_debug_trace)), matching the existing TDB_TRACE macro pattern. Zero overhead when disabled. tidesdb_skip_unique_check session variable: New per-session boolean: SET SESSION tidesdb_skip_unique_check=1. Skips both PK and UNIQUE secondary index uniqueness checks during INSERT. Same pattern as MyRocks' rocksdb_skip_unique_check. Eliminates the expensive per-row tidesdb_txn_get point lookup (traverses all LSM levels) during bulk loads. Stack-allocated index key buffers in update_row(): Replace two std::unique_ptr<uchar[]>(new uchar[MAX_KEY_LENGTH*2+2]) heap allocations per UPDATE with stack-allocated buffers (~4KB each, well within stack limits). Dup-check iterator caching: Cache iterators per unique-index slot in write_row(). Reuse via seek() instead of iter_new()/iter_free() per row. Add free_dup_iter_cache() helper; free in close(), delete_all_rows(), external_lock(F_UNLCK), and bulk insert mid-commit. Transaction reuse: Use tidesdb_txn_reset() instead of tidesdb_txn_free() + tidesdb_txn_begin_with_isolation() in bulk insert mid-commit and autocommit paths. Iterator survival: Preserve iterators across read-only autocommit statements instead of tearing down and rebuilding. Skip PK dup check for REPLACE INTO without secondary indexes. Zero-copy fetch path: Use get_val_buf_ for BLOB/encrypted path in fetch_row_by_pk(). HA_CLUSTERED_INDEX: Add HA_CLUSTERED_INDEX to table_flags(). Encryption buffer reuse: serialize_row() now writes to enc_buf_ and returns it, preserving row_buf_ capacity across rows. Avoids re-allocation per encrypted write. Covering index decode expansion: New decode_sort_key_part() handles INT, DATE, DATETIME, TIMESTAMP, YEAR, and CHAR/BINARY (binary/latin1). try_keyread_from_index() and icp_check_secondary() now use it instead of the old integer-only type gate. Encryption key version caching: Call encryption_key_get_latest_version() once per statement, cache in cached_enc_key_ver_. Invalidate via enc_key_ver_valid_ = false at statement end in external_lock(F_UNLCK). Per-statement syscall/THDVAR caching: New per-statement caches in ha_tidesdb.h: cached_time_ / cached_time_valid_ - time(NULL) for TTL cached_sess_ttl_ - THDVAR(thd, ttl) cached_skip_unique_ - THDVAR(thd, skip_unique_check) cached_thdvars_valid_ - invalidation flag write_row: ha_thd() 4x, thd_get_ha_data() 3x, THDVAR() 3x -> each called once and cached. Eliminates ~9 indirect/virtual calls per row. Similar reductions in compute_row_ttl(), update_row(), rnd_init(), index_init(), ensure_scan_iter(). All caches invalidated in external_lock(F_UNLCK). Inplace index build: Replace std::set<std::string> with std::unordered_set<std::string> for O(1) duplicate detection. --- Feature additions --- Issue #76: tidesdb_data_home_dir MYSQL_SYSVAR_STR (read-only) that overrides the auto-computed data directory. When set, the plugin uses that path instead of <mysql_datadir>/../tidesdb_data. Issue #73: SHOW ENGINE TIDESDB STATUS tidesdb_show_status() callback registered on the handlerton. Displays DB-level stats (memory, storage, background queues) and block cache stats (hits, misses, hit rate, entries). Issue #79: Per-index USE_BTREE ha_index_option_struct with use_btree boolean, registered via tidesdb_index_option_list. index_type() checks per-index option first, falls back to table-level. create() and inplace_alter_table() apply per-index USE_BTREE when creating secondary index CFs. Syntax: KEY idx_a (a) USE_BTREE=1. Issue #78: index_type() override Returns 'LSM' for default block-based tables and 'BTREE' for USE_BTREE=1 tables. SHOW KEYS now displays correct index type. Issue #74: ANALYZE TABLE cardinality Samples up to 100K entries from each secondary index CF, counts distinct key prefixes to compute rec_per_key. Before: EXPLAIN showed rows=1 for any index lookup. After: accurate estimates. Issue #82: Table option default variables 5 session-scoped dynamic system variables via HA_TOPTION_SYSVAR: tidesdb_default_compression (NONE/SNAPPY/LZ4/ZSTD/LZ4_FAST) tidesdb_default_write_buffer_size (default 32MB) tidesdb_default_bloom_filter (default ON) tidesdb_default_use_btree (default OFF) tidesdb_default_block_indexes (default ON) Issue #81: Conflict information tidesdb_print_all_conflicts global dynamic sysvar (like innodb_print_all_deadlocks). When ON, every TDB_ERR_CONFLICT logs a warning. Last conflict info displayed in SHOW ENGINE TIDESDB STATUS under 'Conflicts' section. --- New MTR tests --- tidesdb_replace_iodku - REPLACE INTO + IODKU with PK, secondary, unique tidesdb_index_stats - issue #78 index_type + issue #74 cardinality tidesdb_isolation - session isolation level mapping tidesdb_engine_status - SHOW ENGINE TIDESDB STATUS tidesdb_per_index_btree - per-index USE_BTREE tidesdb_data_home_dir - sysvar visibility and read-only enforcement tidesdb_insert_conflict - INSERT vs INSERT conflict (awaits library fix) --- Known issues --- Issue #77: concurrent UPDATE+DELETE crashes the server (SIGSEGV) in core MVCC conflict detection. Not fixable in the plugin layer; requires changes to src/tidesdb.c. Issue #83: INSERT vs INSERT conflict not detected. Library-level bug; tidesdb_txn_get within a transaction's snapshot cannot see uncommitted writes from other transactions by design. Enforcement must happen at commit time via TDB_ERR_CONFLICT. Fix confirmed coming in next library release. UNIQUE secondary index check still creates a full merge-heap iterator via tidesdb_iter_new per UNIQUE index per INSERT when skip_unique_check is off. A future tidesdb_txn_exists_prefix() API would avoid building the full merge heap.
* Deleted 20 redundant .opt files that only contained --plugin-load-add=ha_tidesdb + --plugin-maturity=unknown * Stripped redundant lines from the 7 remaining .opt files that had actual unique options (encryption keys, loose test vars)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…2/#73/#74/#77/#78/#74/#75/#76/#79/#80/#81/#82/#83/#84
Correct handlerton compatibility for MariaDB 11.4+/12+. Correct MTR suite for 11.4+ and 12+. Fix concurrent conflict handling (issue #77). Major performance pass eliminating per-row overhead in hot paths.
--- Correctness fixes ---
Isolation levels now respected:
Add resolve_effective_isolation() that maps the MariaDB session's
SET TRANSACTION ISOLATION LEVEL (via thd_tx_isolation()) to the
corresponding TidesDB isolation level:
Previously ensure_stmt_txn() and external_lock() used
share->isolation_level (static table option) for multi-statement
txns and hard-coded READ_COMMITTED for autocommit.
start_consistent_snapshot hard-coded REPEATABLE_READ. The session's
tx_isolation was completely ignored.
All three paths now call resolve_effective_isolation(thd,
share->isolation_level). TidesDB's SNAPSHOT level (no SQL
equivalent) is honoured when the session is at REPEATABLE READ and
the table option specifies SNAPSHOT.
Map MariaDB REPEATABLE_READ -> TidesDB SNAPSHOT, which is the
semantic equivalent of InnoDB's repeatable-read (write-write
conflict detection only, no read-set tracking).
dup_ref population (root cause of 'Can't find record' on IODKU):
write_row() now copies the conflicting row's PK bytes into dup_ref
before returning HA_ERR_FOUND_DUPP_KEY, for both PK duplicates and
unique secondary index duplicates.
REPLACE INTO secondary index orphan fix:
Remove write_can_replace_ from the skip_unique check entirely. Now
only the tidesdb_skip_unique_check session variable skips uniqueness
checks. Both REPLACE INTO and INSERT ON DUPLICATE KEY UPDATE let
write_row() return HA_ERR_FOUND_DUPP_KEY so the server properly
handles delete+reinsert and cleans up old secondary index entries
via delete_row().
extra() fix:
HA_EXTRA_INSERT_WITH_UPDATE no longer sets write_can_replace_.
DDL isolation fix:
inplace_alter_table() (ALTER TABLE ADD INDEX) previously used
share->isolation_level for scan transactions. If the table was
created with SERIALIZABLE, the index build would track millions of
read-set entries causing unbounded memory growth. Now hard-coded to
TDB_ISOLATION_READ_COMMITTED for both initial and batch-restart
transactions. DDL never needs OCC conflict detection.
Bulk insert mid-commit isolation:
The batch re-begin in write_row() previously used
share->isolation_level. Now uses TDB_ISOLATION_READ_COMMITTED
since bulk inserts do not need snapshot consistency across batches.
Bulk insert iterator invalidation:
After bulk insert mid-commit creates a fresh transaction, the old
scan_iter held a dangling pointer to the freed txn. Now invalidate
the iterator (tidesdb_iter_free + NULL) and update scan_txn to point
to the new transaction. Bump txn_generation so other handler
objects detect the change.
--- Library fixes (src/tidesdb.c) ---
SNAPSHOT isolation read-set tracking:
tidesdb_txn_add_to_read_set() was only skipping tracking for
isolation levels < TDB_ISOLATION_REPEATABLE_READ (i.e. RU and RC).
SNAPSHOT (enum value 3) fell through and tracked every read, but
SNAPSHOT only needs write-write conflict detection.
Fix: skip read tracking when isolation_level == TDB_ISOLATION_SNAPSHOT.
Also fix tidesdb_txn_check_read_conflicts() to skip for SNAPSHOT.
Previously SNAPSHOT ran full read-conflict and SSI checks at commit,
making it behave identically to SERIALIZABLE.
--- Performance fixes ---
Debug logging gated behind srv_debug_trace:
~25 sql_print_information('[T77]...') calls were firing
unconditionally on every INSERT, SELECT, scan, lock, and commit.
At high throughput this serialises all threads on the error-log
mutex. All calls now gated with if (unlikely(srv_debug_trace)),
matching the existing TDB_TRACE macro pattern. Zero overhead when
disabled.
tidesdb_skip_unique_check session variable:
New per-session boolean: SET SESSION tidesdb_skip_unique_check=1.
Skips both PK and UNIQUE secondary index uniqueness checks during
INSERT. Same pattern as MyRocks' rocksdb_skip_unique_check.
Eliminates the expensive per-row tidesdb_txn_get point lookup
(traverses all LSM levels) during bulk loads.
Stack-allocated index key buffers in update_row():
Replace two std::unique_ptr<uchar[]>(new uchar[MAX_KEY_LENGTH*2+2])
heap allocations per UPDATE with stack-allocated buffers (~4KB each,
well within stack limits).
Dup-check iterator caching:
Cache iterators per unique-index slot in write_row(). Reuse via
seek() instead of iter_new()/iter_free() per row. Add
free_dup_iter_cache() helper; free in close(), delete_all_rows(),
external_lock(F_UNLCK), and bulk insert mid-commit.
Transaction reuse:
Use tidesdb_txn_reset() instead of tidesdb_txn_free() +
tidesdb_txn_begin_with_isolation() in bulk insert mid-commit and
autocommit paths.
Iterator survival:
Preserve iterators across read-only autocommit statements instead of
tearing down and rebuilding.
Skip PK dup check for REPLACE INTO without secondary indexes.
Zero-copy fetch path:
Use get_val_buf_ for BLOB/encrypted path in fetch_row_by_pk().
HA_CLUSTERED_INDEX:
Add HA_CLUSTERED_INDEX to table_flags().
Encryption buffer reuse:
serialize_row() now writes to enc_buf_ and returns it, preserving
row_buf_ capacity across rows. Avoids re-allocation per encrypted
write.
Covering index decode expansion:
New decode_sort_key_part() handles INT, DATE, DATETIME, TIMESTAMP,
YEAR, and CHAR/BINARY (binary/latin1). try_keyread_from_index()
and icp_check_secondary() now use it instead of the old
integer-only type gate.
Encryption key version caching:
Call encryption_key_get_latest_version() once per statement, cache
in cached_enc_key_ver_. Invalidate via enc_key_ver_valid_ = false
at statement end in external_lock(F_UNLCK).
Per-statement syscall/THDVAR caching:
New per-statement caches in ha_tidesdb.h:
write_row: ha_thd() 4x, thd_get_ha_data() 3x, THDVAR() 3x -> each
called once and cached. Eliminates ~9 indirect/virtual calls per
row. Similar reductions in compute_row_ttl(), update_row(),
rnd_init(), index_init(), ensure_scan_iter().
All caches invalidated in external_lock(F_UNLCK).
Inplace index build:
Replace std::setstd::string with std::unordered_setstd::string
for O(1) duplicate detection.
--- Feature additions ---
Issue #76: tidesdb_data_home_dir
MYSQL_SYSVAR_STR (read-only) that overrides the auto-computed data
directory. When set, the plugin uses that path instead of
<mysql_datadir>/../tidesdb_data.
Issue #73: SHOW ENGINE TIDESDB STATUS
tidesdb_show_status() callback registered on the handlerton.
Displays DB-level stats (memory, storage, background queues) and
block cache stats (hits, misses, hit rate, entries).
Issue #79: Per-index USE_BTREE
ha_index_option_struct with use_btree boolean, registered via
tidesdb_index_option_list. index_type() checks per-index option
first, falls back to table-level. create() and
inplace_alter_table() apply per-index USE_BTREE when creating
secondary index CFs. Syntax: KEY idx_a (a) USE_BTREE=1.
Issue #78: index_type() override
Returns 'LSM' for default block-based tables and 'BTREE' for
USE_BTREE=1 tables. SHOW KEYS now displays correct index type.
Issue #74: ANALYZE TABLE cardinality
Samples up to 100K entries from each secondary index CF, counts
distinct key prefixes to compute rec_per_key. Before: EXPLAIN
showed rows=1 for any index lookup. After: accurate estimates.
Issue #82: Table option default variables
5 session-scoped dynamic system variables via HA_TOPTION_SYSVAR:
tidesdb_default_compression (NONE/SNAPPY/LZ4/ZSTD/LZ4_FAST)
tidesdb_default_write_buffer_size (default 32MB)
tidesdb_default_bloom_filter (default ON)
tidesdb_default_use_btree (default OFF)
tidesdb_default_block_indexes (default ON)
Issue #81: Conflict information
tidesdb_print_all_conflicts global dynamic sysvar (like
innodb_print_all_deadlocks). When ON, every TDB_ERR_CONFLICT logs
a warning. Last conflict info displayed in SHOW ENGINE TIDESDB
STATUS under 'Conflicts' section.
--- New MTR tests ---
tidesdb_replace_iodku - REPLACE INTO + IODKU with PK, secondary, unique
tidesdb_index_stats - issue #78 index_type + issue #74 cardinality
tidesdb_isolation - session isolation level mapping
tidesdb_engine_status - SHOW ENGINE TIDESDB STATUS
tidesdb_per_index_btree - per-index USE_BTREE
tidesdb_data_home_dir - sysvar visibility and read-only enforcement
tidesdb_insert_conflict - INSERT vs INSERT conflict (awaits library fix)
--- Known issues ---
Issue #77: concurrent UPDATE+DELETE crashes the server (SIGSEGV) in
core MVCC conflict detection. Not fixable in the plugin layer;
requires changes to src/tidesdb.c.
Issue #83: INSERT vs INSERT conflict not detected. Library-level
bug; tidesdb_txn_get within a transaction's snapshot cannot see
uncommitted writes from other transactions by design. Enforcement
must happen at commit time via TDB_ERR_CONFLICT. Fix confirmed
coming in next library release.
UNIQUE secondary index check still creates a full merge-heap
iterator via tidesdb_iter_new per UNIQUE index per INSERT when
skip_unique_check is off. A future tidesdb_txn_exists_prefix() API
would avoid building the full merge heap.