Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions include/async_io_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <deque>
#include <memory>
#include <optional>
#include <span>
#include <string>
#include <string_view>
#include <tuple>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <variant>
#include <vector>
Expand Down Expand Up @@ -481,6 +483,7 @@ class IouringMgr : public AsyncIoManager
return 0; // IouringMgr doesn't use local file caching
}

KvError TryCleanupLocalPartitionDir(const TableIdent &tbl_id);
void CleanManifest(const TableIdent &tbl_id) override;

static constexpr uint64_t oflags_dir = O_DIRECTORY | O_RDONLY;
Expand Down Expand Up @@ -528,6 +531,8 @@ class IouringMgr : public AsyncIoManager
* @brief mu_ avoids open/close file concurrently.
*/
Mutex mu_;
bool opening_{false};
WaitingZone open_waiting_;
int fd_{FdEmpty};
int reg_idx_{-1};
bool dirty_{false};
Expand Down Expand Up @@ -899,6 +904,8 @@ class CloudStoreMgr final : public IouringMgr
const TableIdent &tbl_id) override;
std::pair<ManifestFilePtr, KvError> RefreshManifest(
const TableIdent &tbl_id, std::string_view archive_tag);
void RequestGcLocalCleanup(const TableIdent &tbl_id,
const std::vector<std::string> &filenames);
Comment on lines +907 to +908
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect all mutation/read sites for pending GC cleanup structures and lock usage.
rg -n "RequestGcLocalCleanup|pending_gc_cleanup_|pending_gc_cleanup_queue_|std::lock_guard|Mutex|mu_" include/async_io_manager.h src/async_io_manager.cpp -C4

Repository: eloqdata/eloqstore

Length of output: 19467


Add synchronization for concurrent access to GC cleanup tracking structures.

RequestGcLocalCleanup() and the file cleaner background task access pending_gc_cleanup_ and pending_gc_cleanup_queue_ without visible synchronization. Protect all accesses to these structures with a consistent lock, ensuring the insert→queue pair remains atomic.

Also applies to: 1022-1023

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/async_io_manager.h` around lines 907 - 908, Concurrent accesses to
pending_gc_cleanup_ and pending_gc_cleanup_queue_ (used by RequestGcLocalCleanup
and the file-cleaner background task) must be protected by a single mutex so the
insert→queue operation is atomic; add a dedicated mutex (e.g., gc_cleanup_mu_)
and wrap all reads/writes to pending_gc_cleanup_ and pending_gc_cleanup_queue_
with a std::lock_guard or std::scoped_lock, update RequestGcLocalCleanup to lock
before checking/inserting and pushing to the queue, and update the file cleaner
background task (and any other places that read or pop from these structures) to
acquire the same lock around those operations. Ensure no access to these two
containers occurs without holding gc_cleanup_mu_ so the pair of operations
remains atomic.

KvError DownloadFile(const TableIdent &tbl_id,
FileId file_id,
uint64_t term,
Expand Down Expand Up @@ -1012,6 +1019,8 @@ class CloudStoreMgr final : public IouringMgr
* @brief Locally cached files that are not currently opened.
*/
std::unordered_map<FileKey, CachedFile> closed_files_;
std::unordered_set<FileKey> pending_gc_cleanup_;
std::deque<FileKey> pending_gc_cleanup_queue_;
CachedFile lru_file_head_;
CachedFile lru_file_tail_;
size_t used_local_space_{0};
Expand Down
3 changes: 2 additions & 1 deletion include/file_gc.h
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ KvError DeleteUnreferencedCloudFiles(
const std::vector<uint64_t> &manifest_terms,
const RetainedFiles &retained_files,
FileId least_not_archived_file_id,
CloudStoreMgr *cloud_mgr);
CloudStoreMgr *cloud_mgr,
std::vector<std::string> &deleted_filenames);
} // namespace FileGarbageCollector

} // namespace eloqstore
1 change: 1 addition & 0 deletions include/storage/index_page_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class IndexPageManager
KvError MakeCowRoot(const TableIdent &tbl_ident, CowRootMeta &cow_meta);

void UpdateRoot(const TableIdent &tbl_ident, CowRootMeta new_meta);
void MarkManifestMissing(const TableIdent &tbl_ident);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify locking/serialization around MarkManifestMissing and RootMeta updates.
rg -n "IndexPageManager::MarkManifestMissing|RootMetaMgr::Find\\(|RootMetaMgr::UpdateBytes\\(|manifest_size_" include src -C3

Repository: eloqdata/eloqstore

Length of output: 9831


🏁 Script executed:

# Find RootMetaMgr class definition with lock information
rg -n "class RootMetaMgr|mu_|lock_|mutex_|rwlock_" include/storage src/storage -B2 -A10 | head -100

Repository: eloqdata/eloqstore

Length of output: 4975


🏁 Script executed:

# Check the full Find() implementation for any lock acquisition
sed -n '204,220p' src/storage/root_meta.cpp

Repository: eloqdata/eloqstore

Length of output: 404


🏁 Script executed:

# Check the full UpdateBytes() implementation
sed -n '250,270p' src/storage/root_meta.cpp

Repository: eloqdata/eloqstore

Length of output: 502


🏁 Script executed:

# Look for lock guards around MarkManifestMissing
sed -n '343,368p' src/storage/index_page_manager.cpp

Repository: eloqdata/eloqstore

Length of output: 829


🏁 Script executed:

# Get full RootMetaMgr class definition
sed -n '96,250p' include/storage/root_meta.h

Repository: eloqdata/eloqstore

Length of output: 3146


🏁 Script executed:

# Search for all callers of MarkManifestMissing to see the call context
rg -n "MarkManifestMissing" include src

Repository: eloqdata/eloqstore

Length of output: 351


🏁 Script executed:

# Look for mutex or lock fields in root_meta files
rg -n "mu_|mutex_|lock_|rwlock_|RWLock" include/storage/root_meta.h src/storage/root_meta.cpp

Repository: eloqdata/eloqstore

Length of output: 44


🏁 Script executed:

# Get Pin/Unpin implementation
rg -n "void RootMetaMgr::Pin|void RootMetaMgr::Unpin" src/storage -A10

Repository: eloqdata/eloqstore

Length of output: 1245


🏁 Script executed:

# Check context around MarkManifestMissing call in file_gc.cpp
sed -n '770,795p' src/file_gc.cpp

Repository: eloqdata/eloqstore

Length of output: 665


🏁 Script executed:

# Search for any synchronization around the GC operations
sed -n '750,800p' src/file_gc.cpp

Repository: eloqdata/eloqstore

Length of output: 1412


Add synchronization to MarkManifestMissing() or document its single-threaded usage.

The implementation calls Find() and directly accesses meta_ fields without synchronization. While RootMetaMgr provides Handle with Pin/Unpin for reference counting, MarkManifestMissing() bypasses this pattern entirely, leaving concurrent access to manifest_size_ (and other meta fields) unprotected. Either acquire the entry via Handle before mutation or add explicit locking, or clearly document that this method is only safe when called from a single-threaded context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/storage/index_page_manager.h` at line 58, MarkManifestMissing()
mutates meta_ (e.g. manifest_size_) after calling Find() without using the
RootMetaMgr Handle/Pin/Unpin or any lock, risking concurrent access; fix it by
acquiring the entry via RootMetaMgr::Handle (call Find(tbl_ident) to get a
Handle), Pin() before mutating meta_ and Unpin() when done (or wrap the mutation
in the existing per-entry mutex if one exists), or alternatively add explicit
synchronization around access to meta_ (e.g. a std::mutex guarding
manifest_size_ updates) and use it in MarkManifestMissing(); if you
intentionally rely on single-threaded use, add clear documentation on
MarkManifestMissing() stating that it must only be called from a single-threaded
context.


std::pair<MemIndexPage::Handle, KvError> FindPage(MappingSnapshot *mapping,
PageId page_id);
Expand Down
Loading
Loading