Skip to content

Conversation

@armyhaylenko
Copy link
Contributor

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Mar 31, 2025

Summary by CodeRabbit

  • New Features

    • Introduced configurable options for slot backfilling with new interval and offset settings.
    • Enabled concurrent processing for regular and missed slot backfilling to boost throughput.
    • Enhanced slot persister functionality with improved checkpointing and missed slot management.
  • Refactor

    • Streamlined slot tracking and in-memory management for increased reliability.
  • Chores

    • Updated service configuration by removing outdated environment variable settings.

Walkthrough

This pull request introduces new environment variables for configuring slot persistence and enhances the backfilling functionality to support both missed and regular slots. It adds asynchronous functions for processing missed slots in the ingester module, along with a cancellation mechanism. The slot persister module is updated with new constants and command-line arguments to manage missed slots through periodic checks and database updates. Additionally, the RocksDB storage layer is expanded with new columns, enum variants, and key encoders to facilitate tracking and processing of missed slots.

Changes

File(s) Change Summary
.env.example Added new environment variables for slot persistence configuration: SLOT_BACKFILL_INTERVAL=120, SLOT_BACKFILL_OFFSET=500000.
nft_ingester/src/backfiller.rs Added asynchronous functions run_backfill and backfill_missed for missed slots processing; modified run_backfill to handle both regular and missed slots.
nft_ingester/src/bin/ingester/main.rs Added a new async task that spawns run_backfill with BackfillTarget::MissedSlots, enabling parallel processing of missed slots alongside regular slot backfilling.
nft_ingester/src/bin/slot_persister/main.rs Introduced constants (DEFAULT_SLOT_BACKFILL_INTERVAL_SECS, DEFAULT_SLOT_BACKFILL_OFFSET), extended the Args struct with new fields, and added functions get_checkpoint and get_missed_slots for handling missed slots.
rocks-db/src/columns/parameters.rs Added new enum variant LastProcessedMissedSlotKey to the Parameter enum.
rocks-db/src/columns/raw_block.rs Introduced new structs MissedSlotsIdx and SlotConsistencyCheckpoint, both implementing the TypedColumn trait.
rocks-db/src/key_encoders.rs Added methods encode_u64x2 and decode_u64x2 for encoding and decoding pairs of u64 values.
rocks-db/src/lib.rs Expanded SlotStorage struct with new fields slot_consistency_checkpoint, missed_slots_idx, and missed_slots_last_seq; added method next_missed_slots_seq.
backfill_rpc/src/rpc.rs Added #[derive(Clone)] to BackfillRPC struct for cloning capabilities.
nft_ingester/src/bin/slot_checker/main.rs Removed InMemorySlotsDumper struct and its associated methods.
nft_ingester/src/inmemory_slots_dumper.rs Updated slots field type from Mutex<BTreeSet<u64>> to Arc<Mutex<BTreeSet<u64>>> and added #[derive(Clone)].
usecase/src/bigtable.rs Added #[derive(Clone)] to BigTableClient struct.
usecase/src/slots_collector.rs Added #[derive(Clone)] to SlotsCollector struct.
docker-compose.yaml Removed environment variables SLOTS_DB_PRIMARY_PATH and METRICS_PORT from slot-persister service.

Sequence Diagram(s)

sequenceDiagram
  participant Main
  participant Executor
  participant Backfiller
  participant DB

  Main->>Executor: Spawn run_backfill task (RegularSlots)
  Main->>Executor: Spawn run_backfill task (MissedSlots)
  Executor->>Backfiller: run_backfill(BackfillTarget::MissedSlots)
  Backfiller->>DB: Retrieve last processed missed slot
  Backfiller->>DB: Process missed slots in a loop
  Note over Backfiller: Loop until cancellation signal is received
Loading
sequenceDiagram
  participant Persister as SlotPersisterMain
  participant BackfillSource
  participant DB
  participant CancellationToken

  Persister->>Persister: Spawn periodic get_missed_slots task
  loop Check for missed slots
     Persister->>DB: Retrieve last consistency checkpoint
     DB-->>Persister: Return checkpoint
     Persister->>BackfillSource: Fetch missed slots within range
     BackfillSource-->>Persister: Return missed slots data
     Persister->>DB: Update consistency checkpoint after processing
     CancellationToken-->>Persister: Check cancellation signal
  end
Loading

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 6c87eac and e1d53e2.

📒 Files selected for processing (1)
  • nft_ingester/src/bin/slot_persister/main.rs (7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
nft_ingester/src/bin/slot_persister/main.rs (5)
metrics_utils/src/utils.rs (1)
  • start_metrics (79-90)
nft_ingester/src/config.rs (1)
  • parse_json (554-556)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (364-369)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
🔇 Additional comments (2)
nft_ingester/src/bin/slot_persister/main.rs (2)

1-7: No concerns for newly added imports.
These new import statements appear correct and useful for the added functionality.

Also applies to: 10-10, 17-17, 23-27


57-59: A straightforward helper function.
Returning one-fifth of the slot backfill offset is concise and unlikely to cause confusion.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 86286c9 to 4db4e8f Compare March 31, 2025 14:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91bb450 and 4db4e8f.

📒 Files selected for processing (8)
  • .env.example (2 hunks)
  • nft_ingester/src/backfiller.rs (4 hunks)
  • nft_ingester/src/bin/ingester/main.rs (2 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • rocks-db/src/columns/parameters.rs (1 hunks)
  • rocks-db/src/columns/raw_block.rs (2 hunks)
  • rocks-db/src/key_encoders.rs (1 hunks)
  • rocks-db/src/lib.rs (4 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (59-61)
  • encode_key (85-87)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
rocks-db/src/columns/raw_block.rs (2)
rocks-db/src/lib.rs (3)
  • Self (441-441)
  • new (93-105)
  • new (227-320)
rocks-db/src/key_encoders.rs (4)
  • encode_u64x2 (157-162)
  • decode_u64x2 (164-166)
  • encode_u64 (88-90)
  • decode_u64 (92-95)
nft_ingester/src/bin/slot_persister/main.rs (1)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-base-image
  • GitHub Check: test
  • GitHub Check: lint
🔇 Additional comments (16)
rocks-db/src/columns/parameters.rs (1)

11-11: Adding new Parameter enum variant for missed slot tracking

This addition of LastProcessedMissedSlot to the Parameter enum looks good. It effectively extends the parameter tracking capabilities to support the new missed slot processing functionality.

.env.example (1)

42-46: Configuration parameters for slot persister functionality

The new environment variables for slot persistence are well-structured and provide the necessary configuration options for the missed slot fetch functionality:

  • Path to slots data
  • Chunk size for processing
  • Backfill interval (21600 seconds = 6 hours)
  • Backfill offset (1,000,000 slots)

These values look reasonable as defaults for an example configuration.

nft_ingester/src/bin/ingester/main.rs (1)

569-590:

Details

❓ Verification inconclusive

Parallel task for missed slot backfilling looks good

The addition of a separate task for backfilling missed slots follows the same pattern as the existing backfill task, which is good for consistency. This implementation correctly reuses the same components (consumer, database, metrics) for both regular and missed slot backfilling.

One thing to note: this task runs only when both run_backfiller and run_bubblegum_backfiller are true. Is this dependency on the Bubblegum backfiller intentional?


🏁 Script executed:

#!/bin/bash
# Check if the run_backfill_missed function has any additional flags or options compared to run_backfill_slots
# to understand if the dependency on Bubblegum backfiller is intentional or if it should have its own flag

rpc_source_files=$(fd -e rs -e toml backfill nft_ingester/)
echo "Searching for run_backfill_missed implementation to check for flags..."
grep -A 10 "run_backfill_missed" $rpc_source_files

echo "Checking if there are other places where run_backfill_missed is called..."
grep -l "run_backfill_missed" --include="*.rs" -r . | grep -v "$(basename "$0")"

Length of output: 2880


Missed Slot Backfiller Task: Structure Looks Good – Please Verify Flag Dependency

The new asynchronous task for missed slot backfilling is well aligned with the established backfiller pattern by reusing the same components (consumer, database, and metrics). However, note that this task is executed only when both run_backfiller and run_bubblegum_backfiller are enabled. Our investigation confirms that the run_backfill_missed function itself does not introduce any additional flags or options compared to its standard backfiller counterparts, which suggests that the dependency on the Bubblegum backfiller flag is imposed externally.

  • Location: nft_ingester/src/bin/ingester/main.rs (lines 569–590)

Please verify that coupling the missed slot backfiller’s execution to both flags is intentional and that no separate flag should control this behavior.

rocks-db/src/columns/raw_block.rs (2)

8-8: Good addition of serialization/deserialization imports.

Importing Serialize and Deserialize is consistent with usage in the new typed columns.


50-74: Struct-based key-only storage looks fine.

By returning an empty vector in encode, this approach relies solely on the key. This is acceptable if there's no need to store associated data for MissedSlotsIdx. Everything else looks correct.

nft_ingester/src/backfiller.rs (3)

14-16: Appropriate import additions.

These newly introduced columns and parameters are aligned with the missed slots backfilling logic.


23-23: Expanded logging.

Adding warn to the logging import set is aligned with the new missed slots warnings below.


252-308: Verify the decision to break on missing raw block.

When raw_block is missing, the code logs a warning and breaks from iteration, preventing subsequent missed slots from processing. This may be intentional to avoid marking progress, but could also stall future missed slots indefinitely.

Would you like to skip or remove the slot instead, allowing the code to continue with the next missed slot?

rocks-db/src/lib.rs (5)

23-26: Concurrency imports look suitable.

Atomic operations are necessary to coordinate the missed slot sequence safely across threads.


86-88: New columns and sequence tracker introduced.

Declaring slot_consistency_checkpoint, missed_slots_idx, and missed_slots_last_seq extends slot-related tracking capabilities.


93-105: Constructor initialization is correct.

All new columns are properly instantiated via Storage::column, and the atomic sequence is set to zero for a fresh start.


107-115: Column families updated.

Including SlotConsistencyCheckpoint::NAME and MissedSlotsIdx::NAME in cf_names() ensures database column families are recognized.


156-172: Double-check partial key decoding.

Currently, only the first 8 bytes of the last key are decoded to obtain seq. If the second 64-bit portion of (u64, u64) is relevant, consider decoding it as well or clarifying why it's ignored.

Would you like a script to search for all calls referencing next_missed_slots_seq to confirm the usage is strictly tied to the sequence portion?

nft_ingester/src/bin/slot_persister/main.rs (3)

16-20: Imports look appropriate.
These imports correctly bring in the required types and column families for slot indexing and checkpointing.


37-43: Constants for slot backfill configuration look good.
Defining these default values and accompanying documentation is a clear way to control slot backfill scheduling and offset.


237-242: Concurrent task spawn is appropriate.
Spawning get_missed_slots in a separate task ensures the main loop isn’t blocked by periodic scanning, maintaining responsiveness.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 4db4e8f to 44939d4 Compare March 31, 2025 14:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
nft_ingester/src/bin/ingester/main.rs (1)

548-592: 🧹 Nitpick (assertive)

Implementation of dual backfill processes for regular and missed slots

The code effectively sets up two parallel backfill processes - one for regular slots and another for missed slots. Both use identical infrastructure components but target different types of slots to be processed.

However, there's significant code duplication between the two spawned tasks. The initialization of components (consumer, db, metrics, slot_db) is identical in both blocks.

Consider extracting the common setup into a helper function to reduce duplication:

-            usecase::executor::spawn({
-                let cancellation_token = cancellation_token.child_token();
-                let consumer = Arc::new(DirectBlockParser::new(
-                    tx_ingester.clone(),
-                    primary_rocks_storage.clone(),
-                    metrics_state.backfiller_metrics.clone(),
-                ));
-                let db = primary_rocks_storage.clone();
-                let metrics: Arc<BackfillerMetricsConfig> =
-                    metrics_state.backfiller_metrics.clone();
-                let slot_db = slot_db.clone();
-                async move {
-                    nft_ingester::backfiller::run_backfill(
-                        cancellation_token,
-                        db,
-                        slot_db,
-                        consumer,
-                        metrics,
-                        nft_ingester::backfiller::BackfillTarget::RegularSlots,
-                    )
-                    .await;
-                }
-            });
-            usecase::executor::spawn({
-                let cancellation_token = cancellation_token.child_token();
-                let consumer = Arc::new(DirectBlockParser::new(
-                    tx_ingester.clone(),
-                    primary_rocks_storage.clone(),
-                    metrics_state.backfiller_metrics.clone(),
-                ));
-                let db = primary_rocks_storage.clone();
-                let metrics: Arc<BackfillerMetricsConfig> =
-                    metrics_state.backfiller_metrics.clone();
-                let slot_db = slot_db.clone();
+            // Helper function to spawn a backfill task with the specified target
+            let spawn_backfill_task = |target: nft_ingester::backfiller::BackfillTarget| {
+                let cancellation_token = cancellation_token.child_token();
+                let consumer = Arc::new(DirectBlockParser::new(
+                    tx_ingester.clone(),
+                    primary_rocks_storage.clone(),
+                    metrics_state.backfiller_metrics.clone(),
+                ));
+                let db = primary_rocks_storage.clone();
+                let metrics: Arc<BackfillerMetricsConfig> =
+                    metrics_state.backfiller_metrics.clone();
+                let slot_db = slot_db.clone();
+                
+                usecase::executor::spawn(async move {
+                    nft_ingester::backfiller::run_backfill(
+                        cancellation_token,
+                        db,
+                        slot_db,
+                        consumer,
+                        metrics,
+                        target,
+                    )
+                    .await;
+                });
+            };
+            
+            // Spawn tasks for both types of backfilling
+            spawn_backfill_task(nft_ingester::backfiller::BackfillTarget::RegularSlots);
+            spawn_backfill_task(nft_ingester::backfiller::BackfillTarget::MissedSlots);
-                async move {
-                    nft_ingester::backfiller::run_backfill(
-                        cancellation_token,
-                        db,
-                        slot_db,
-                        consumer,
-                        metrics,
-                        nft_ingester::backfiller::BackfillTarget::MissedSlots,
-                    )
-                    .await;
-                }
-            });
.env.example (1)

89-91: 🧹 Nitpick (assertive)

Consider standardizing file ending format

The file ends with a vim mode line but has an empty line before it. Consider standardizing the format of all configuration files to follow the same pattern for EOF handling.

-
 # vim: set ft=bash
♻️ Duplicate comments (1)
nft_ingester/src/bin/slot_persister/main.rs (1)

551-654: 🛠️ Refactor suggestion

Address nested function naming and improve error handling.

  1. The nested function fn get_missed_slots shares the same name as the enclosing async fn get_missed_slots, which can be confusing. A different helper name (e.g., retrieve_missed_slots_range) clarifies intent.
  2. Multiple .expect(...) calls for database writes could lead to panics under failure. Consider returning errors or using a graceful fallback strategy to avoid abruptly terminating the process.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4e8f and 44939d4.

📒 Files selected for processing (8)
  • .env.example (2 hunks)
  • nft_ingester/src/backfiller.rs (5 hunks)
  • nft_ingester/src/bin/ingester/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • rocks-db/src/columns/parameters.rs (1 hunks)
  • rocks-db/src/columns/raw_block.rs (2 hunks)
  • rocks-db/src/key_encoders.rs (1 hunks)
  • rocks-db/src/lib.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
rocks-db/src/key_encoders.rs (2)
Learnt from: armyhaylenko
PR: metaplex-foundation/aura#465
File: rocks-db/src/key_encoders.rs:157-162
Timestamp: 2025-03-31T14:24:35.687Z
Learning: The `encode_u64x2` function in the key_encoders module intentionally returns a fixed-size array `[u8; 16]` instead of `Vec<u8>` to avoid heap allocations (syscalls) for performance reasons, despite the API inconsistency with other encoder functions.
Learnt from: armyhaylenko
PR: metaplex-foundation/aura#465
File: rocks-db/src/key_encoders.rs:164-166
Timestamp: 2025-03-31T14:25:35.651Z
Learning: The `decode_u64x2` function in the key_encoders module doesn't need explicit size validation because the `try_into()` method will implicitly validate the input size when converting byte slices to fixed-size arrays.
🧬 Code Definitions (5)
nft_ingester/src/bin/ingester/main.rs (1)
nft_ingester/src/backfiller.rs (3)
  • db (192-192)
  • db (245-245)
  • run_backfill (135-180)
rocks-db/src/columns/raw_block.rs (2)
rocks-db/src/lib.rs (3)
  • Self (441-441)
  • new (93-105)
  • new (227-320)
rocks-db/src/key_encoders.rs (4)
  • encode_u64x2 (157-162)
  • decode_u64x2 (164-166)
  • encode_u64 (88-90)
  • decode_u64 (92-95)
rocks-db/src/lib.rs (1)
rocks-db/src/columns/asset.rs (1)
  • new (3345-3347)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (59-61)
  • encode_key (85-87)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
nft_ingester/src/bin/slot_persister/main.rs (1)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
🔇 Additional comments (19)
rocks-db/src/columns/parameters.rs (1)

10-11: New parameter for tracking missed slots processing

The addition of LastProcessedMissedSlot to the Parameter enum properly supports the missed slots backfilling functionality. This parameter will be used to track the last processed missed slot, which is essential for the missed slots backfill process.

.env.example (1)

42-46: Environment variables added for slot persister configuration

The new environment variables for slot persister are well-documented and provide sensible default values for:

  • Path to slot data
  • Chunk size for processing
  • Interval for backfilling
  • Offset for backfilling

These additions properly support the new missed slots backfill functionality.

rocks-db/src/key_encoders.rs (2)

157-162: Optimized encode_u64x2 implementation using fixed-size array

The implementation efficiently encodes a tuple of two u64s using a fixed-size array instead of Vec. This avoids heap allocations (syscalls) for better performance.

I note this intentionally differs from other encoder return types for performance reasons.


164-166: Concise decode_u64x2 implementation

The decoder function is correctly implemented without explicit size validation as the try_into() method will implicitly validate input size when converting to fixed-size arrays.

rocks-db/src/columns/raw_block.rs (2)

8-8: Serde usage is appropriate.
Importing serde::{Deserialize, Serialize} is correct for implementing data serialization in newly added structs.


50-74: Verify empty vector encoding for MissedSlotsIdx.
This struct encodes and decodes to empty byte slices, which may be intentional if the key alone is meaningful. Confirm that no actual body data needs to be persisted for each entry.

rocks-db/src/lib.rs (4)

86-88: Check concurrency implications for missed_slots_last_seq.
Storing the sequence in AtomicU64 with Relaxed ordering may cause stale reads if used concurrently across threads. Verify if stronger ordering is required.


93-105: Fields' initialization matches their usage.
The constructor properly wires up the new columns and missed_slots_last_seq. No issues spotted.


108-115: Column families updated correctly.
Adding SlotConsistencyCheckpoint::NAME and MissedSlotsIdx::NAME to cf_names() ensures the DB columns are properly recognized.


155-172: Validate logic in next_missed_slots_seq.

  1. The function attempts to load the highest sequence from the last key if missed_slots_last_seq is zero.
  2. The iteration accessor iter_end returns the last item, but confirm there's no edge case if the database is empty.
  3. The function uses Relaxed ordering, which might cause concurrency concerns with partial updates.
nft_ingester/src/backfiller.rs (3)

6-6: Imports and logging additions look fine.
The additions of Either, new column references, and the warn log level are consistent with the code.

Also applies to: 15-17, 24-24


135-180: Clean approach to branching backfill logic.
Using Either for concurrent calls to specialized backfill_missed or backfill_slots is concise. Code duplication is minimized.


235-291: Evaluate the early break condition when raw_block is unavailable.
Currently, if a raw_block is missing at a certain (seq, slot), the loop breaks, potentially skipping subsequent missed slots. Confirm this is desired behavior or consider continuing with later entries.

nft_ingester/src/bin/slot_persister/main.rs (6)

16-20: Imports look good.
No issues found with referencing MissedSlotsIdx and SlotConsistencyCheckpoint from the new columns in RocksDB.


45-45: Annotation is fine.
No concerns with the #[derive(Parser, Debug, Clone)] attribute.


86-87: Command-line argument interval looks good.
Properly references the newly introduced DEFAULT_SLOT_BACKFILL_INTERVAL_SECS.


89-90: Backfill offset parameter is acceptable.
This aligns with the constant defined above, though see prior note regarding large offsets.


102-127: Potential risk in unchecked usage of unwrap().
While the design might be sufficient for a strictly controlled environment, errors with missing column families or key decoding will panic. Consider robust error handling to prevent unexpected panics.


237-242: Spawning the missed slots routine is appropriate.
No immediate concurrency issues spotted, as RocksDB can handle multiple writers. Continue monitoring for potential concurrency interactions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 44939d4 and ec53185.

📒 Files selected for processing (7)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
nft_ingester/src/bin/slot_persister/main.rs (4)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
metrics_utils/src/lib.rs (8)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
nft_ingester/src/inmemory_slots_dumper.rs (3)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (372-377)
usecase/src/slots_collector.rs (1)
  • new (81-87)
usecase/src/bigtable.rs (1)
  • new (27-32)
🔇 Additional comments (13)
usecase/src/bigtable.rs (1)

20-20: Adding Clone derive is a good improvement for concurrent usage.

This change enables BigTableClient instances to be cloned efficiently, which is useful for concurrent processing scenarios. Since the struct fields are already wrapped in Arc, cloning is efficient as it only increases reference counts rather than duplicating the underlying data.

usecase/src/slots_collector.rs (1)

65-65: Implementing Clone for SlotsCollector aligns with concurrent processing improvements.

Adding the Clone trait to SlotsCollector enables more flexible ownership management in asynchronous contexts, which aligns with the PR objectives for enhanced slot processing. Since the struct's fields are already using Arc, cloning is efficient.

backfill_rpc/src/rpc.rs (1)

40-40: Adding Clone derive provides consistent behavior with other components.

This change makes BackfillRPC cloneable, which is consistent with the changes to other structs in the codebase. This helps support the periodic missed slot fetching functionality by allowing the RPC client to be easily shared across tasks.

nft_ingester/src/inmemory_slots_dumper.rs (3)

1-1: Added Arc import for shared ownership.

The addition of Arc to the imports supports the change in the struct's field type. This is necessary for the new implementation that enables shared ownership of the slots collection.


7-9: Enhanced for thread-safety with Clone and Arc.

Making InMemorySlotsDumper cloneable and changing slots to use Arc<Mutex<BTreeSet<u64>>> improves thread safety and enables shared state between cloned instances. This is important for the periodic missed slot fetch functionality where multiple tasks might access the same slot collection.


19-19: Updated constructor to initialize with Arc-wrapped Mutex.

The constructor now correctly initializes the slots field with an Arc-wrapped Mutex, ensuring that cloned instances of InMemorySlotsDumper share the same underlying collection. This aligns with the thread-safety improvements in the struct definition.

nft_ingester/src/bin/slot_checker/main.rs (3)

3-3: HashSet import usage looks fine.

No immediate issues identified. The HashSet usage aligns well with deduplication logic below.


15-15: Inconsistent summary regarding InMemorySlotsDumper.

The AI-generated summary mentions removing InMemorySlotsDumper, but this import and subsequent uses remain. This discrepancy indicates that the summary may be out of date or incomplete.

Likely an incorrect or invalid review comment.


20-20: Graceful shutdown approach is valid.

Leveraging tokio::signal and CancellationToken for Ctrl+C handling is a robust solution.

nft_ingester/src/backfiller.rs (2)

136-181: Flexible control flow for slot backfilling.

Using Either to branch between MissedSlots and RegularSlots is clean and extensible. The periodic loop logic also covers cancellation gracefully. No major issues found.


193-234:

Details

✅ Verification successful

Leveraging database parameter for last backfilled slot.

Retrieving and updating the last backfilled slot via Parameter::LastBackfilledSlot ensures continuity between backfill iterations. Ensure that calling code or external logic properly sets/reads this parameter if external changes occur.

Run the following script to confirm the presence of references to Parameter::LastBackfilledSlot in the codebase:


🏁 Script executed:

#!/bin/bash
# Searching for references to Parameter::LastBackfilledSlot beyond this file
ast-grep --pattern $'Parameter::LastBackfilledSlot'

Length of output: 3543


Parameter LastBackfilledSlot Usage Confirmed

The verification confirms that references to Parameter::LastBackfilledSlot are consistently present across the codebase (including tests and client implementations). The usage in nft_ingester/src/backfiller.rs is in line with its application elsewhere. As a reminder, ensure that any external logic or calling code properly sets and updates this parameter to maintain continuity between backfill iterations.

nft_ingester/src/bin/slot_persister/main.rs (2)

1-7: New imports align with additional slot checking logic.

The newly introduced items (e.g., HashMap, Duration, Mutex) facilitate missed slot handling and concurrency. No immediate issues.


44-50: Useful defaults for backfill settings.

DEFAULT_SLOT_BACKFILL_INTERVAL_SECS and DEFAULT_SLOT_BACKFILL_OFFSET appear to be reasonable defaults. Ensure that large offsets don’t cause excessive memory usage or scanning overhead.

Please confirm in the documentation or tests that using the default offset of 1,000,000 slots won’t exceed memory constraints under production workloads.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from ec53185 to cd61aad Compare April 1, 2025 14:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
nft_ingester/src/backfiller.rs (1)

129-134: 🧹 Nitpick (assertive)

Add doc comments to clarify usage of BackfillTarget.

Providing a short doc comment explaining when to use MissedSlots vs. RegularSlots would help readers quickly understand the backfill variants.

 #[derive(Debug)]
+/// Defines which type of slots to backfill: `RegularSlots` or `MissedSlots`.
 pub enum BackfillTarget {
     RegularSlots,
     MissedSlots,
 }
nft_ingester/src/bin/slot_persister/main.rs (1)

92-97: 🛠️ Refactor suggestion

Consider validating user-supplied intervals.

If a user sets slot_backfill_interval or slot_backfill_offset to zero or an unexpectedly large value, it could lead to performance or logic issues. A sanity check can help avoid misconfiguration.

 #[arg(long, env, help = "How often to check for missed slots (in seconds)", default_value_t = DEFAULT_SLOT_BACKFILL_INTERVAL_SECS)]
 pub slot_backfill_interval: u64,

 #[arg(long, env, help = "The number of slots by which to check & advance the next sequential slots", default_value_t = DEFAULT_SLOT_BACKFILL_OFFSET)]
 pub slot_backfill_offset: u64,

+ // Example validation at startup:
+ if args.slot_backfill_interval == 0 {
+     panic!("slot_backfill_interval must not be zero!");
+ }
+ if args.slot_backfill_offset == 0 {
+     panic!("slot_backfill_offset must not be zero!");
+ }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between ec53185 and cd61aad.

📒 Files selected for processing (7)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
nft_ingester/src/inmemory_slots_dumper.rs (4)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (372-377)
usecase/src/bigtable.rs (1)
  • new (27-32)
usecase/src/slots_collector.rs (1)
  • new (81-87)
metrics_utils/src/lib.rs (11)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
  • new (732-749)
  • new (844-853)
  • new (939-958)
nft_ingester/src/backfiller.rs (1)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
nft_ingester/src/bin/slot_persister/main.rs (5)
rocks-db/src/lib.rs (3)
  • column (444-452)
  • new (93-105)
  • new (227-320)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (372-377)
  • db (193-193)
  • db (246-246)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
metrics_utils/src/lib.rs (8)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
🔇 Additional comments (18)
usecase/src/bigtable.rs (1)

20-20: Appropriate addition of Clone derivation for BigTableClient

Adding #[derive(Clone)] to the BigTableClient struct is a good change since it enables cloning of client instances, which is likely needed for the periodic missed slot fetch functionality. Since the struct fields are already wrapped in Arc, cloning will be efficient and only copy the reference-counted pointers.

backfill_rpc/src/rpc.rs (1)

40-40: Appropriate addition of Clone derivation for BackfillRPC

Adding #[derive(Clone)] to the BackfillRPC struct is sensible because it allows the creation of duplicate instances when needed for async operations. The contained BackoffRpcClient already implements Clone (as seen on line 206), so this is a consistent change.

usecase/src/slots_collector.rs (1)

65-65: Appropriate addition of Clone derivation for SlotsCollector

Adding #[derive(Clone)] to the SlotsCollector struct is appropriate. All internal fields already use Arc for shared ownership, making the Clone implementation efficient by only copying the Arc pointers rather than the underlying data. This change aligns with the other modifications in this PR and supports the new missed slot fetching functionality.

nft_ingester/src/backfiller.rs (3)

6-6: Imports and Clone derivation look good.

The newly introduced imports (Either, expanded columns, and additional tracing macros) and the #[derive(Clone)] annotation for BackfillSource are appropriate and pose no issues.

Also applies to: 15-17, 24-24, 42-42


136-181: Good approach to unify backfill entry points.

run_backfill cleanly delegates to either backfill_missed or backfill_slots based on the BackfillTarget enum. This is straightforward to maintain, and the logging is suitably descriptive. If further logic overlaps, consider factoring out shared steps.


236-292: Consider refining the break condition.

When a raw block is missing, you break the entire loop to avoid advancing LastProcessedMissedSlot. This ensures no missed slots are skipped, but it also permanently halts backfilling. A single missing block can stall future missed slots.

nft_ingester/src/inmemory_slots_dumper.rs (1)

1-1: Switch to Arc<Mutex<_>> and #[derive(Clone)] is appropriate.

Wrapping BTreeSet<u64> in Arc<Mutex<...>> and deriving Clone allows multiple components to share and synchronize writes. This change is consistent with typical concurrency patterns.

Also applies to: 7-7, 9-9, 19-19

nft_ingester/src/bin/slot_checker/main.rs (1)

3-3: New imports are well-aligned with the refactor.

Adding HashSet ensures straightforward slot deduplication, and using tokio::signal plus InMemorySlotsDumper is consistent with the revised slot-handling logic.

Also applies to: 15-15, 20-20

nft_ingester/src/bin/slot_persister/main.rs (10)

1-7: Imports look good.

No issues found with these newly introduced imports.


10-10: No concerns for BUBBLEGUM_PROGRAM_ID import.

This change appears correct and consistent with the rest of the codebase.


17-17: Metrics import looks fine.

No issues found with the newly introduced metrics imports.


23-27: RocksDB columns usage is consistent.

No concerns regarding these references to typed columns.


44-50: Large default offset can be memory-intensive.

This offset (1,000,000) may cause significant memory usage if the range of missing slots is massive. Consider chunking or streaming.


52-52: Deriving Parser, Debug, and Clone looks standard.

No additional feedback needed.


109-134: Consider robust error handling for missing/corrupted column families.

Reusing a comment from a previous review. The current approach relies on unwrap() and unwrap_or_default(), which may not gracefully handle unexpected DB corruption or missing column families.


559-559: No additional feedback.

This blank line change requires no further remarks.


560-770: Naming conflict with nested get_missed_slots.

A local async function named get_missed_slots exists within another function of the same name. This can be confusing; consider renaming the inner function (e.g., retrieve_missed_slots_range) for clarity.


560-770: Potential memory overhead with large offset.

Collecting up to slot_backfill_offset worth of slots in memory may be costly if extremely large. This concern was previously noted. A chunked or streaming approach might be more robust.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from cd61aad to 7d944de Compare April 1, 2025 14:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
nft_ingester/src/bin/slot_persister/main.rs (1)

560-606: 🧹 Nitpick (assertive)

Inner function shadowing the parent function name
Declaring async fn get_missed_slots(...) inside async fn get_missed_slots(...) may confuse future maintainers. Consider renaming the inner helper (e.g., retrieve_missed_slots_range) to differentiate them clearly.

- async fn get_missed_slots(
+ async fn retrieve_missed_slots_range(
    in_mem_dumper: Arc<InMemorySlotsDumper>,
    ...
) -> Option<(u64, u64, Vec<u64>)> {
    ...
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between cd61aad and 7d944de.

📒 Files selected for processing (7)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
nft_ingester/src/inmemory_slots_dumper.rs (3)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (372-377)
usecase/src/slots_collector.rs (1)
  • new (81-87)
usecase/src/bigtable.rs (1)
  • new (27-32)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (59-61)
  • encode_key (85-87)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
nft_ingester/src/bin/slot_persister/main.rs (3)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
🔇 Additional comments (14)
usecase/src/bigtable.rs (1)

20-20: Approved adding Clone trait

Adding the Clone trait to the BigTableClient struct is appropriate since both fields are already wrapped in Arc, making cloning cheap and safe.

backfill_rpc/src/rpc.rs (1)

40-40: Appropriate addition of Clone trait

Adding the Clone trait to BackfillRPC is appropriate since its only field (BackoffRpcClient) already implements Clone (as seen on line 206). This change is consistent with the PR objective to enhance slot processing capabilities.

usecase/src/slots_collector.rs (1)

65-65: Approved adding Clone trait

The Clone trait addition is appropriate for SlotsCollector as all fields are already wrapped in Arc, making the clone operation efficient by only cloning the reference-counted pointers rather than the underlying data.

nft_ingester/src/inmemory_slots_dumper.rs (1)

1-1: Appropriate modification for shared state between clones

Wrapping the Mutex in an Arc and adding the Clone trait ensures that all clones of InMemorySlotsDumper share the same underlying state. This is a good approach for maintaining consistency when the struct needs to be cloned.

Also applies to: 7-9, 19-19

nft_ingester/src/bin/slot_checker/main.rs (3)

3-3: Proper use of HashSet for duplicate filtering
The addition of HashSet is a straightforward approach to eliminate duplicate slots. This looks correct and is used effectively below.


15-15: Importing InMemorySlotsDumper
The import is used when creating the in_mem_dumper instance. This aligns with the logic for collecting slots and seems valid.


20-20: Graceful shutdown signal import
Using tokio::signal for handling Ctrl+C helps ensure a graceful shutdown. No concerns here.

nft_ingester/src/bin/slot_persister/main.rs (4)

1-7: New imports for HashMap, Arc, Mutex, etc.
These imports (e.g., HashMap, Arc, Mutex) are consistent with the concurrency and data structures used later in the file.


93-98: Added CLI arguments for slot backfill interval & offset
Introducing slot_backfill_interval and slot_backfill_offset provides flexible, configurable backfill scheduling. Be mindful of large default offsets (1,000,000) if the data range is extremely large.


109-134: New function get_checkpoint merges checkpoint logic
This function retrieves the last consistency checkpoint or falls back to the first RawBlock. The approach is clean and idiomatic. One minor caution: if the SlotConsistencyCheckpoint CF is missing or corrupted, the default fallback is 0. Ensure this behavior is intentional.


708-770: Periodic loop for missed slots backfill
The main loop repeatedly attempts to find and process missed slots, then sleeps based on slot_backfill_interval. The logic is sound. Confirm that partial progress is persisted before shutdown to avoid redoing work.

nft_ingester/src/backfiller.rs (3)

6-6: Introducing Either for branching futures
Using Either from futures::future is an elegant way to conditionally run backfill_missed or backfill_slots. This is a clean abstraction.


136-168: run_backfill orchestrating different backfill targets
The loop pattern with Either for missed or regular slots is straightforward. The retry interval and cancellation token usage are appropriate. This structure is well-organized for extension.


236-292: New backfill_missed function
Backfilling missed slots by iterating over MissedSlotsIdx and stopping on unreadable slots ensures we don’t lose track of unprocessed data. Logging warnings for missing blocks is helpful. Possibly queue them for later re-try if feasible.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 7d944de to 702f0ed Compare April 1, 2025 15:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
nft_ingester/src/bin/slot_persister/main.rs (1)

244-251: 🧹 Nitpick (assertive)

Spawned task lacks join handle management.
Spawning get_missed_slots and not storing the JoinHandle can obscure potential runtime errors or panics. Consider capturing and handling the task result for better observability.

+ let missed_slots_handle = usecase::executor::spawn(get_missed_slots(
+    backfill_source.clone(),
+    target_db.clone(),
+    args.clone(),
+    metrics_state.backfiller_metrics.clone(),
+    cancellation_token.child_token(),
+ ));
+ // Optionally: missed_slots_handle.await?;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 7d944de and 702f0ed.

📒 Files selected for processing (7)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
nft_ingester/src/inmemory_slots_dumper.rs (3)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (372-377)
usecase/src/slots_collector.rs (1)
  • new (81-87)
usecase/src/bigtable.rs (1)
  • new (27-32)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (59-61)
  • encode_key (85-87)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
nft_ingester/src/bin/slot_persister/main.rs (2)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
🔇 Additional comments (17)
usecase/src/bigtable.rs (1)

20-20: Added Clone derive: Good for shared instance needs.

This addition of #[derive(Clone)] to BigTableClient enables instances to be cloned, which is useful when sharing the client across different components or tasks. This is a safe change because the internal fields are already wrapped in Arc, ensuring that cloning will be shallow and efficient.

usecase/src/slots_collector.rs (1)

65-65: Added Clone derive: Appropriate for this Arc-containing struct.

Adding #[derive(Clone)] to SlotsCollector is a good enhancement that enables cloning of collector instances. Since all fields are already wrapped in Arc, this implementation of Clone will perform shallow cloning, which is efficient and appropriate for this use case.

backfill_rpc/src/rpc.rs (1)

40-40: Added Clone derive: Good addition for RPC client sharing.

Adding #[derive(Clone)] to BackfillRPC aligns with the struct's design and allows for easier sharing of RPC client instances. The field client is already of type BackoffRpcClient which itself derives Clone (line 206), so this change is consistent with the existing architecture.

nft_ingester/src/inmemory_slots_dumper.rs (2)

1-1: Modified struct design for shared state across clones.

The changes to add #[derive(Clone)] and wrap the Mutex in an Arc are good architectural improvements. This ensures that when InMemorySlotsDumper instances are cloned, they will share the same underlying slot collection, allowing multiple components to access and modify the same set of slots.

Also applies to: 7-7, 9-9


19-19: Constructor updated to match the field type change.

The constructor has been properly updated to initialize the slots field with an Arc-wrapped Mutex, consistent with the field's new type definition. This ensures proper initialization when new instances are created.

nft_ingester/src/bin/slot_checker/main.rs (3)

3-3: Confirm intended usage of HashSet.
Importing collections::HashSet for deduplicating or membership checks is perfectly valid. If you ever need sorted ordering when removing duplicates, consider BTreeSet.


15-15: New import for InMemorySlotsDumper is consistent.
The InMemorySlotsDumper aligns well with your slot-collection logic. No issues found.


20-20: Graceful shutdown approach is appropriate.
Using tokio::signal for Ctrl+C handling is a good practice for clean cancellation.

nft_ingester/src/bin/slot_persister/main.rs (5)

1-7: No issues with the newly introduced imports.
Imports for HashMap, PathBuf, Arc<Mutex>, and Duration appear standard and correctly utilized for concurrency, path handling, and timing.


10-10: Bubblegum program ID import is straightforward.
Referencing the BUBBLEGUM_PROGRAM_ID here is consistent with the ingester's usage.


15-15: Metrics utilities import looks clean.
No functionality issues spotted; metrics usage seems well-structured elsewhere in the file.


23-24: New columns for missed slots and slot consistency.
Adding MissedSlotsIdx and SlotConsistencyCheckpoint columns aligns with the new backfill logic. Ensure the DB migration and usage patterns are well-tested to avoid potential key collisions.


559-769: Nested get_missed_slots naming and potential memory overhead.

  1. Nested function name clash: Declaring a nested function with the same name (get_missed_slots) can be confusing. Consider renaming the inner helper to avoid ambiguity.
  2. Memory usage: Large slot offset ranges may collect many slots in memory. For extremely large ranges, implement chunked processing to avoid excessive memory usage.

These concerns mirror previous suggestions in older commits regarding naming and chunked or streaming approaches, so tagging as repeated feedback.

nft_ingester/src/backfiller.rs (4)

6-6: Conditional concurrency utility is well-chosen.
Importing futures::future::Either provides a concise way to branch based on conditions, particularly in run_backfill.


15-17: Adding new columns to handle missed slots and parameters.
Your usage of parameters::Parameter and raw_block::MissedSlotsIdx is consistent with indexing missed slots and saving config. Just ensure that your DB schema or migrations handle the new columns properly.


24-24: Tracing macros for error/info/warn are standard.
No issues found.


136-181: run_backfill logic is well-structured with branching.
Using Either::Left or Right is a neat pattern for picking the correct backfill function. The small sleep approach is simple, though you might want to ensure no busy waiting in advanced scenarios.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 702f0ed to 4d29349 Compare April 1, 2025 15:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 702f0ed and 4d29349.

📒 Files selected for processing (7)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (6 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
nft_ingester/src/inmemory_slots_dumper.rs (3)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (372-377)
usecase/src/slots_collector.rs (1)
  • new (81-87)
usecase/src/bigtable.rs (1)
  • new (27-32)
nft_ingester/src/bin/slot_persister/main.rs (3)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (63-65)
  • decode_key (89-91)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
🔇 Additional comments (17)
usecase/src/slots_collector.rs (1)

65-65: LGTM: Adding Clone derive is appropriate.

Adding #[derive(Clone)] to the SlotsCollector struct is a good addition since all fields are already wrapped in Arc, which makes them safely shareable across multiple clones. This change aligns with the PR objective of supporting periodic missed slot fetch functionality.

usecase/src/bigtable.rs (1)

20-20: LGTM: Adding Clone derive is appropriate.

Adding #[derive(Clone)] to the BigTableClient struct is correct since both fields are already wrapped in Arc, enabling safe sharing across threads. This facilitates using this struct in the new missed slot fetch functionality.

backfill_rpc/src/rpc.rs (1)

40-40: LGTM: Adding Clone derive is appropriate.

Adding #[derive(Clone)] to the BackfillRPC struct is a good addition. The struct has a single field client of type BackoffRpcClient, which is already Clone, making this change straightforward and safe. This enhancement supports the periodic missed slot fetch functionality.

nft_ingester/src/inmemory_slots_dumper.rs (4)

1-1: LGTM: Adding Arc import is correct.

The import of Arc from std::sync is necessary to support the changes made to the internal field type in this struct.


7-7: LGTM: Adding Clone derive is appropriate.

The #[derive(Clone)] addition is consistent with the other structs modified in this PR and enables sharing this struct across multiple consumers in the slot persistence system.


9-9: LGTM: Wrapping Mutex in Arc enables shared ownership.

Changing the slots field from Mutex<BTreeSet<u64>> to Arc<Mutex<BTreeSet<u64>>> is a good enhancement. This allows multiple clones of the InMemorySlotsDumper to share ownership of the same underlying collection, which is important for the periodic missed slot fetch functionality.


19-19: LGTM: Constructor updated to use Arc.

The constructor has been correctly updated to initialize the slots field with an Arc wrapper around the Mutex.

nft_ingester/src/bin/slot_checker/main.rs (3)

3-3: Using HashSet for duplicate handling looks good.
This import is appropriate to ensure we can efficiently track distinct slots.


15-15: Adding InMemorySlotsDumper usage.
Ensures consistency with in-memory slot-logging logic. No further issues noted.


20-20: Graceful shutdown mechanism via tokio::signal.
Using signal::ctrl_c is standard practice for clean teardown.

nft_ingester/src/bin/slot_persister/main.rs (5)

1-7: Consolidated imports from std.
These additions (e.g., Ordering, HashMap, etc.) appear necessary and well-organized.


10-10: Bubblegum program import.
No immediate concerns; usage looks aligned with broader references to BUBBLEGUM_PROGRAM_ID.


17-27: Including metrics and RocksDB columns (MissedSlotsIdx, SlotConsistencyCheckpoint).
These imports align with the missed slot backfill feature. They appear correctly referenced elsewhere.


52-52: Struct deriving Clone for command-line parser.
This is helpful, especially if the struct is cloned across threads. Looks good.


560-789: Nested get_missed_slots naming and large range handling.

  1. The inner function shadowing the outer function name can confuse maintainers; consider a distinct name (e.g., retrieve_missed_slots_range).
  2. When handling large offsets, be mindful of potential resource usage.
nft_ingester/src/backfiller.rs (2)

6-6: Introducing futures::future::Either.
No issues noted; it simplifies branching logic for asynchronous tasks.


136-181: Refined run_backfill with a BackfillTarget parameter.
The branching with Either::Left and Either::Right is a neat approach to unify missed vs. regular slot handling.

@armyhaylenko armyhaylenko marked this pull request as draft April 1, 2025 15:51
@armyhaylenko armyhaylenko marked this pull request as ready for review April 2, 2025 13:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
rocks-db/src/columns/raw_block.rs (1)

79-103: 🧹 Nitpick (assertive)

Consider documenting the SlotConsistencyCheckpoint semantics.

While the implementation is correct, adding a brief docstring describing the purpose and usage patterns of this checkpoint would improve maintainability for future contributors.

+/// Represents a checkpoint for slot consistency verification.
+/// Used to track which slots have been verified for consistency
+/// in the missed slots processing pipeline.
 #[derive(Serialize, Deserialize, Clone)]
 pub struct SlotConsistencyCheckpoint;
nft_ingester/src/bin/slot_persister/main.rs (2)

248-255: 🛠️ Refactor suggestion

Spawned missed slot processing without join handle

Consider capturing the join handle to detect panics or failures in get_missed_slots. Unnoticed panic may suppress error visibility.

-usecase::executor::spawn(get_missed_slots(
+let missed_slots_handle = usecase::executor::spawn(get_missed_slots(
     backfill_source.clone(),
     target_db.clone(),
     args.clone(),
     metrics_state.backfiller_metrics.clone(),
     cancellation_token.child_token(),
 ));

+// Optionally: missed_slots_handle.await?;

97-101: 🛠️ Refactor suggestion

New CLI fields for missed slot backfill

slot_backfill_interval and slot_backfill_offset could benefit from minimal validation (e.g., ensuring nonzero values) to prevent misconfiguration.

 #[arg(long, env, help = "How often to check for missed slots (in seconds)", default_value_t = DEFAULT_SLOT_BACKFILL_INTERVAL_SECS)]
 pub slot_backfill_interval: u64,

 #[arg(long, env, help = "The number of slots by which to check & advance the next sequential slots", default_value_t = DEFAULT_SLOT_BACKFILL_OFFSET)]
 pub slot_backfill_offset: u64,

+ // Example validation approach (pseudocode):
+ fn validate_backfill_args(&self) -> Result<(), String> {
+     if self.slot_backfill_interval == 0 || self.slot_backfill_offset == 0 {
+         return Err("Values must be nonzero".into());
+     }
+     Ok(())
+ }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 3c12360 and 97da515.

📒 Files selected for processing (6)
  • .env.example (2 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/ingester/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (7 hunks)
  • rocks-db/src/columns/parameters.rs (1 hunks)
  • rocks-db/src/columns/raw_block.rs (2 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
rocks-db/src/columns/raw_block.rs (2)
rocks-db/src/columns/parameters.rs (4)
  • encode_key (26-28)
  • decode_key (30-33)
  • encode (35-38)
  • decode (40-43)
rocks-db/src/key_encoders.rs (4)
  • encode_u64x2 (157-162)
  • decode_u64x2 (164-166)
  • encode_u64 (88-90)
  • decode_u64 (92-95)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (62-64)
  • encode_key (88-90)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
nft_ingester/src/bin/slot_persister/main.rs (7)
nft_ingester/src/config.rs (1)
  • parse_json (554-556)
nft_ingester/src/api/dapi/change_logs.rs (1)
  • rocks_db (146-166)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (362-367)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
metrics_utils/src/lib.rs (8)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
🔇 Additional comments (18)
rocks-db/src/columns/raw_block.rs (1)

50-77: Good implementation of MissedSlotsIdx with clear documentation.

The implementation correctly handles the encoding and decoding of the composite key type (u64, u64) using the appropriate key encoders. The comment explaining the purpose of the sequence number for decoupling missed slots order is helpful.

.env.example (1)

42-44: Well-structured environment variables for slot persistence.

The added environment variables provide good defaults for configuring the slot backfill process:

  • SLOT_BACKFILL_INTERVAL=120 - Sets a reasonable default interval (in seconds)
  • SLOT_BACKFILL_OFFSET=1000000 - Sets a large enough offset to handle historical missed slots

These values will help users configure the slot persister's operation without code changes.

nft_ingester/src/bin/ingester/main.rs (2)

553-571: Good refactoring to support both regular and missed slot processing.

The implementation properly initializes a separate task for regular slot backfilling, reusing the existing consumer, database, metrics and slot storage. This maintains the original functionality while setting up for the new missed slots feature.


572-590: Effective implementation of missed slots processing.

You've correctly updated the task to use BackfillTarget::MissedSlots while sharing the consumer with the regular slots processing. This approach avoids code duplication and maintains consistent behavior across both backfill types.

rocks-db/src/columns/parameters.rs (1)

11-11: Good addition of the LastProcessedMissedSlotKey parameter.

This new parameter provides the necessary state tracking for the missed slots processing feature, following the same pattern as the other slot-tracking parameters in the enum.

nft_ingester/src/bin/slot_persister/main.rs (7)

1-7: Added imports for ordering, collections, and concurrency

No immediate concerns. These imports align with the new missed slots logic for comparisons and data handling.


10-10: Alias bubblegum program ID

No issues with importing and aliasing the bubblegum program ID for reference.


17-17: Ensure all newly imported traits are used

MetricsTrait is imported. Confirm that it’s consistently utilized in the code to avoid dead imports.


23-27: New columns for missed slots and checkpoint

No issues. MissedSlotsIdx and SlotConsistencyCheckpoint help track missed slots and store consistency checkpoints.


48-54: Default backfill settings

DEFAULT_SLOT_BACKFILL_INTERVAL_SECS = 120 seconds and DEFAULT_SLOT_BACKFILL_OFFSET = 1,000,000. Verify they suit the expected load and memory constraints, especially if large slot ranges are processed.


113-138: Checkpoint fallback logic

This function checks SlotConsistencyCheckpoint first, then falls back to scanning RawBlock. The approach is sound. For large databases, consider indexing or early exit if no checkpoints.


738-803: Periodic missed slot backfill loop

The scheduling loop updates the checkpoint, processes missed slots, and then sleeps. This is straightforward. Ensure logs remain manageable if missed slot checks happen frequently without results.

nft_ingester/src/backfiller.rs (6)

6-6: Introduced Either from futures

This is used to branch logic in run_backfill. No issues identified.


15-17: Adding columns and parameters references

MissedSlotsIdx and Parameter are integrated here. No immediate concerns.


24-24: Expanded tracing levels

Using error, info, and warn. No issues; consistent practice for diagnostic visibility.


136-142: Refined backfill dispatcher

run_backfill delegates to either backfill_slots or backfill_missed based on target. Clean approach. Confirm test coverage.


155-161: Handling missed slots with Either::Left

Combining Either with tokio::select! is an elegant way to handle concurrent tasks. No immediate issues.


235-282: Partial processing in backfill_missed

When a missed slot has no raw block, the loop breaks and subsequent missed slots remain unprocessed. If partial missed-slot handling is undesirable, consider skipping only that missing slot or logging it differently.

Would you like a script to scan for any large sequences of missed slots that might remain unprocessed?

@armyhaylenko armyhaylenko requested a review from StanChe April 3, 2025 13:37
@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 97da515 to 40df4ee Compare April 3, 2025 14:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
rocks-db/src/key_encoders.rs (1)

157-166: 🧹 Nitpick (assertive)

Missing unit tests for new encode/decode functions.

While the implementation is correct, there are no unit tests for the new encode_u64x2 and decode_u64x2 functions, unlike other encoder/decoder pairs in this file.

Consider adding tests to maintain consistent test coverage:

#[test]
fn test_encode_decode_u64x2() {
    let val1 = 12345u64;
    let val2 = 67890u64;
    
    let encoded = encode_u64x2((val1, val2));
    let decoded = decode_u64x2(&encoded).unwrap();
    
    assert_eq!(decoded.0, val1);
    assert_eq!(decoded.1, val2);
}

#[test]
fn test_decode_u64x2_invalid_data() {
    let invalid_data = [1u8, 2, 3]; // Too short
    assert!(decode_u64x2(&invalid_data).is_err());
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 97da515 and 40df4ee.

📒 Files selected for processing (13)
  • .env.example (2 hunks)
  • backfill_rpc/src/rpc.rs (1 hunks)
  • nft_ingester/src/backfiller.rs (6 hunks)
  • nft_ingester/src/bin/ingester/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_checker/main.rs (1 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (7 hunks)
  • nft_ingester/src/inmemory_slots_dumper.rs (2 hunks)
  • rocks-db/src/columns/parameters.rs (1 hunks)
  • rocks-db/src/columns/raw_block.rs (2 hunks)
  • rocks-db/src/key_encoders.rs (1 hunks)
  • rocks-db/src/lib.rs (4 hunks)
  • usecase/src/bigtable.rs (1 hunks)
  • usecase/src/slots_collector.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
rocks-db/src/key_encoders.rs (2)
Learnt from: armyhaylenko
PR: metaplex-foundation/aura#465
File: rocks-db/src/key_encoders.rs:157-162
Timestamp: 2025-04-03T14:15:26.033Z
Learning: The `encode_u64x2` function in the key_encoders module intentionally returns a fixed-size array `[u8; 16]` instead of `Vec<u8>` to avoid heap allocations (syscalls) for performance reasons, despite the API inconsistency with other encoder functions.
Learnt from: armyhaylenko
PR: metaplex-foundation/aura#465
File: rocks-db/src/key_encoders.rs:164-166
Timestamp: 2025-04-03T14:15:26.033Z
Learning: The `decode_u64x2` function in the key_encoders module doesn't need explicit size validation because the `try_into()` method will implicitly validate the input size when converting byte slices to fixed-size arrays.
🧬 Code Definitions (6)
nft_ingester/src/bin/ingester/main.rs (1)
nft_ingester/src/backfiller.rs (3)
  • db (192-192)
  • db (245-245)
  • run_backfill (136-180)
nft_ingester/src/inmemory_slots_dumper.rs (3)
nft_ingester/src/backfiller.rs (3)
  • new (49-66)
  • new (121-127)
  • new (362-367)
usecase/src/slots_collector.rs (1)
  • new (81-87)
usecase/src/bigtable.rs (1)
  • new (27-32)
rocks-db/src/columns/raw_block.rs (2)
rocks-db/src/lib.rs (3)
  • Self (441-441)
  • new (93-105)
  • new (227-320)
rocks-db/src/key_encoders.rs (4)
  • encode_u64x2 (157-162)
  • decode_u64x2 (164-166)
  • encode_u64 (88-90)
  • decode_u64 (92-95)
rocks-db/src/lib.rs (2)
rocks-db/src/columns/asset.rs (1)
  • new (3345-3347)
rocks-db/src/column.rs (1)
  • iter (301-308)
nft_ingester/src/bin/slot_persister/main.rs (6)
nft_ingester/src/config.rs (1)
  • parse_json (554-556)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (362-367)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
metrics_utils/src/lib.rs (8)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (62-64)
  • encode_key (88-90)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
🔇 Additional comments (38)
.env.example (1)

42-44: LGTM: Clear and well-documented environment variable additions.

The two new environment variables for slot persister configuration are properly documented and have sensible default values. This aligns well with the PR objectives of adding periodic missed slot fetch functionality.

usecase/src/bigtable.rs (1)

20-20: Appropriate Clone derivation added.

Adding #[derive(Clone)] to BigTableClient is appropriate as it enables multiple references to the same client instance without transferring ownership, which is particularly useful in asynchronous contexts for the slot persistence functionality.

backfill_rpc/src/rpc.rs (1)

40-40: Appropriate Clone derivation added.

Adding #[derive(Clone)] to BackfillRPC is appropriate as it enables multiple references to the same RPC client, which is necessary for the new periodic missed slot fetch functionality.

usecase/src/slots_collector.rs (1)

65-65: Appropriate Clone derivation added.

Adding #[derive(Clone)] to SlotsCollector allows the collector to be cloned when needed, which aligns with the other similar changes in BigTableClient and BackfillRPC. This modification enables better handling in concurrent contexts for the new missed slot processing functionality.

nft_ingester/src/bin/ingester/main.rs (2)

552-571: Added separate task for regular slot backfilling.

This addition properly implements a dedicated task for processing regular slots using the backfill functionality with the appropriate target enum value. Using a separate asynchronous task allows for parallel processing of different slot types.


572-590: Added parallel task for missed slot backfilling.

The implementation correctly reuses the same consumer instance for both backfill tasks, which is efficient. This missed slots backfill task runs concurrently with the regular slots task, allowing the system to process both types of slots simultaneously.

rocks-db/src/columns/parameters.rs (1)

10-11: Added parameter enum variant for tracking missed slots.

The new LastProcessedMissedSlotKey enum variant provides storage capability for tracking the last processed missed slot, which is essential for the new missed slots backfilling functionality.

nft_ingester/src/inmemory_slots_dumper.rs (3)

1-1: Updated imports to include Arc.

The import of Arc from std::sync is needed for the modification to the InMemorySlotsDumper struct.


7-9: Made InMemorySlotsDumper cloneable with thread-safe state.

The addition of #[derive(Clone)] and changing the slots field to use Arc<Mutex<...>> enables safe sharing of the InMemorySlotsDumper instance across multiple concurrent tasks.

This change is necessary to support the parallel backfill tasks for both regular and missed slots.


19-19: Updated constructor to initialize with Arc-wrapped Mutex.

The constructor now correctly initializes the slots field with an Arc wrapper around the Mutex, which ensures consistent behavior with the updated struct definition.

rocks-db/src/key_encoders.rs (2)

157-162: Added efficient encoder for u64 pairs.

The implementation of encode_u64x2 returns a fixed-size array instead of a Vec to avoid heap allocations, which is a deliberate performance optimization.


164-166: Added decoder for u64 pairs with implicit size validation.

The implementation of decode_u64x2 correctly handles conversion from byte slices to u64 values. The try_into() method implicitly validates the input size when converting byte slices to fixed-size arrays.

nft_ingester/src/bin/slot_checker/main.rs (3)

3-3: Use of HashSet for duplicate filtering
Importing HashSet simplifies deduplication of slots. This is a clean and efficient approach.


15-15: Potential inconsistency with InMemorySlotsDumper usage
The AI-generated summary mentioned removing InMemorySlotsDumper, yet this addition reintroduces its usage. Please ensure consistency with the rest of the slot-management logic to avoid confusion or orphaned references.

Likely an incorrect or invalid review comment.


20-20: Graceful shutdown handling
Using tokio::signal is a standard and effective way to handle Ctrl+C and other termination signals. No issues found.

rocks-db/src/lib.rs (5)

23-26: Importing concurrency primitives
These imports (AtomicU64, Arc, etc.) are necessary for thread-safe slot tracking. No issues found here.


64-64: Importing typed columns for missed slot tracking
Bringing MissedSlotsIdx and SlotConsistencyCheckpoint into scope aligns with the new missed slots and consistency checkpoint logic. This is consistent with their usage below.


86-88: Extending SlotStorage fields
Adding slot_consistency_checkpoint, missed_slots_idx, and missed_slots_last_seq suitably extends the storage capabilities for missed slot indexing and slot consistency tracking.


95-104: Consistent initialization of new columns
Initializing the new SlotStorage fields ensures they are ready for use upon creation. The approach of returning a Self with all references wired up is standard.


108-114: Adding CF names for the new tables
Explicitly including SlotConsistencyCheckpoint::NAME and MissedSlotsIdx::NAME in cf_names() ensures these column families are recognized on database creation or opening.

nft_ingester/src/bin/slot_persister/main.rs (14)

1-7: Approving new standard library imports
These imports (Ordering, HashMap, PathBuf, Arc/Mutex, Duration) appear correct and necessary for the subsequent logic.


10-10: New import for Bubblegum program ID
This aligns well with the usage in collect_slots. No issues found.


17-17: Confirmed correct usage of metrics utilities
Importing start_metrics, BackfillerMetricsConfig, and MetricState is consistent with how these are used below.


23-27: Imports for missed slots tracking
Bringing in MissedSlotsIdx and SlotConsistencyCheckpoint is appropriate for managing missed slot data in RocksDB.


47-55: Well-documented constants
Defining DEFAULT_SLOT_BACKFILL_INTERVAL_SECS and DEFAULT_SLOT_BACKFILL_OFFSET with inline comments is clear.


56-57: Derived CLI parser for new arguments
Marking the struct with Parser is correct to expose new fields as command-line flags.


65-65: Environment-based DB path
Linking target_db_path to the ROCKS_SLOTS_DB_PATH environment variable is a flexible approach.


69-69: Enabling environment variable for RPC host
No issues. This helps configure the RPC endpoint dynamically.


86-86: Chunk size default
A default of 200 slots per batch is generally reasonable.


90-90: Max concurrency
Using a default concurrency of 20 can balance load and performance.


93-96: Additional CLI argument configurations
Your new or modified #[arg] attributes are consistent with the rest of the CLI fields. No immediate concerns.


113-138: Retrieving slot consistency checkpoint
Good fallback logic: tries SlotConsistencyCheckpoint, then RawBlock.
Optionally, log when fallback is used for easier debugging.


563-563: No functional changes on this blank line


738-803: Periodic missed slot loop
Sleeping and re-checking is straightforward. Confirm that partial progress is handled if the process is interrupted mid-backfill.

nft_ingester/src/backfiller.rs (4)

6-6: New import for branching futures
Pulling in Either from futures::future is a valid approach for the backfill logic.


25-25: New tracing import
Adding warn is consistent with the usage for backfill operations.


130-134: New enum for backfill target
Distinguishing RegularSlots from MissedSlots is a clean approach. Adding doc strings for each variant could further improve clarity.


136-169: Refactored run_backfill
Switching execution paths based on BackfillTarget is straightforward. Verify that resource usage is handled uniformly in both branches.

Copy link
Contributor

@StanChe StanChe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, thank you!
Some minor concerns for the potential infinite loop and some optional refactor.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 40df4ee and b9b74ac.

📒 Files selected for processing (2)
  • docker-compose.yaml (0 hunks)
  • nft_ingester/src/bin/slot_persister/main.rs (7 hunks)
💤 Files with no reviewable changes (1)
  • docker-compose.yaml
🧰 Additional context used
🧬 Code Definitions (1)
nft_ingester/src/bin/slot_persister/main.rs (6)
nft_ingester/src/sequence_consistent.rs (1)
  • None (72-72)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (362-367)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
metrics_utils/src/lib.rs (8)
  • new (34-41)
  • new (68-86)
  • new (138-144)
  • new (174-183)
  • new (249-256)
  • new (302-318)
  • new (417-431)
  • new (668-675)
🔇 Additional comments (3)
nft_ingester/src/bin/slot_persister/main.rs (3)

48-55: Validate or cap default values to avoid extreme scenarios.

A default backfill offset of 1,000,000 slots could trigger high memory usage for edge cases or enormous backlogs. Although your code uses min(slot_backfill_offset), consider adding explicit validation or bounding logic to assure the offset is never excessively large and to avoid memory or performance bottlenecks.


113-138: Handle the empty database case gracefully in get_checkpoint.

Falling back to zero when neither SlotConsistencyCheckpoint nor RawBlock has entries might lead to processing from slot 0 if the database is actually empty or corrupted. Ensure that this scenario aligns with requirements and won't re-ingest large ancient slot ranges unintentionally.


711-718: Prevent repeated scanning when no slots exist.

If the RocksDB iterator yields no new valid keys and the collected slots are empty (or all missing), the checkpoint may not advance properly, so you might repeatedly scan the same range. Review the logic to ensure the next checkpoint moves forward or an alternate exit condition is used to avoid indefinite looping on empty ranges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
nft_ingester/src/backfiller.rs (1)

130-134: Consider adding doc comments for BackfillTarget.
This mirrors a previous suggestion to document the usage of RegularSlots vs. MissedSlots.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between b9b74ac and 19d207b.

📒 Files selected for processing (1)
  • nft_ingester/src/backfiller.rs (6 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
nft_ingester/src/backfiller.rs (2)
rocks-db/src/columns/raw_block.rs (8)
  • encode_key (18-20)
  • encode_key (41-43)
  • encode_key (62-64)
  • encode_key (88-90)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
rocks-db/src/columns/parameters.rs (2)
  • encode_key (26-28)
  • decode_key (30-33)
🔇 Additional comments (8)
nft_ingester/src/backfiller.rs (8)

6-6: Use of Either import is clear and fine.
This import helps handle the branching between missed and regular slots in a single code path.


24-24: Consistent use of warn log level.
Including warn alongside info and error is consistent with best practices for nuanced log reporting.


42-42: #[derive(Clone)] is appropriate.
Cloning is only duplicating references within Arc, so it remains efficient.


146-146: Logging target type is helpful.
Using structured logging with ?target provides clarity when checking logs for each backfill mode.


154-168: Split futures with Either.
Branching into backfill_missed or backfill_slots with Either neatly avoids a complex match. This is clean and maintainable.


175-175: Graceful shutdown logic is consistent.
Repeating the same message for a shutdown signal keeps logs consistent across backfill flows.


235-248: Initialize LastProcessedMissedSlotKey with care.
Storing (0u64, 0u64) upon None ensures a starting point, but ensure concurrency or re-entrancy doesn't overwrite any in-progress updates from another process or thread.


276-278: Ignored consumer failure.
Unlike the missing block scenario, consumer failures do not halt processing. If these errors represent unrecoverable states, a consistent break or retry might be warranted.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 19d207b and 40e1bb0.

📒 Files selected for processing (1)
  • nft_ingester/src/bin/slot_persister/main.rs (8 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
nft_ingester/src/bin/slot_persister/main.rs (5)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/sequence_consistent.rs (1)
  • None (72-72)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (364-369)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
usecase/src/slots_collector.rs (1)
  • new (81-87)
🔇 Additional comments (7)
nft_ingester/src/bin/slot_persister/main.rs (7)

47-55: Defaults look reasonable, but watch for extreme scenarios.

The defaults for DEFAULT_SLOT_BACKFILL_INTERVAL_SECS (2 minutes) and DEFAULT_SLOT_BACKFILL_OFFSET (500k slots) make sense for moderate workloads. If usage grows, consider verifying that these defaults won't cause delays or excessive memory usage.


56-66: Good use of environment variables and CLI arguments.

The newly introduced CLI fields (slot_backfill_interval and slot_backfill_offset) facilitate configuration flexibility. This approach is a solid pattern. Ensure to handle invalid or zero values gracefully in practice, if there's a risk of misconfiguration.


69-70: RPC host CLI argument is clear.

Using #[arg(short, long, env)] for rpc_host is straightforward. No immediate concerns noticed.


228-228: Straightforward usage for reading last persisted slot.

Calling get_last_persisted_slot here is appropriate. No concerns.


572-572: No functional changes observed.

Line 572 seems to be a closing brace or whitespace. No further action needed.


573-579: New get_missed_slots function signature.

This high-level async function integrates backfill logic. Be mindful of potential large internal state if the range is huge. Consider short-circuiting or chunking for extreme scenarios.

Would you like a script to check if usage patterns in your codebase always stay within safe bounds?


580-603: Informative example doc comments.

The doc comments illustrate boundary calculations well, helping maintainers understand the arrangement of offsets. The examples are well-structured, though ensure any future changes to offset logic are reflected here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 40e1bb0 and fd80ba2.

📒 Files selected for processing (1)
  • .env.example (2 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between fd80ba2 and 6c87eac.

📒 Files selected for processing (1)
  • nft_ingester/src/bin/slot_persister/main.rs (8 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
nft_ingester/src/bin/slot_persister/main.rs (4)
nft_ingester/src/config.rs (1)
  • parse_json (554-556)
rocks-db/src/columns/raw_block.rs (4)
  • decode_key (22-24)
  • decode_key (45-47)
  • decode_key (66-68)
  • decode_key (92-94)
nft_ingester/src/backfiller.rs (5)
  • new (49-66)
  • new (121-127)
  • new (364-369)
  • db (192-192)
  • db (245-245)
nft_ingester/src/inmemory_slots_dumper.rs (1)
  • new (18-20)
🔇 Additional comments (8)
nft_ingester/src/bin/slot_persister/main.rs (8)

48-48: Documentation note.
This doc comment helps new contributors quickly see the default interval. No issues here.


49-55: Descriptive commentary is clear.
These lines accurately describe how the checkpoint gap is computed and used.


56-56: Clap usage looks good.
Deriving multiple traits for Args is a common practice and poses no concerns.


65-65: Consistent env usage.
Referencing ROCKS_SLOTS_DB_PATH matches the pattern used elsewhere.


69-69: Environment variable reference.
No issues, this is a typical Clap attribute for environment variables.


228-228: Trivial usage of helper function.
Calling get_last_persisted_slot here is straightforward and consistent with the rest of the code.


572-572: No revision needed.
This blank line doesn’t introduce problems or changes worthy of comment.


573-808: Naming conflict and potential memory usage concerns.

  1. The outer async fn get_missed_slots contains another function also named get_missed_slots. This can lead to confusion. Consider renaming the inner helper (e.g., retrieve_missed_slots_range or similar).
  2. While this implementation is robust, it accumulates missed slots in memory. For extremely large ranges, consider chunked or streaming approaches to limit peak memory usage.
    [duplicate_comment, suggest_optional_refactor]

Comment on lines +44 to +46
// Offset to start collecting missed slots from. This exists to prevent concurrent collection of
// later slots and marking of those as missed.
const MISSED_SLOT_COLLECTION_OFFSET: u64 = 10 * SLOT_COLLECTION_OFFSET;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Watch for large offset memory usage.
A multiplier of 10× can create substantial in-memory ranges if real-world offsets become large. Consider verifying that this is acceptable in your deployment environment or adding chunking logic to avoid potential high memory usage.

Comment on lines +97 to +98
#[arg(long, env, help = "How often to check for missed slots (in seconds)", default_value_t = DEFAULT_SLOT_BACKFILL_INTERVAL_SECS)]
slot_backfill_interval: u64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Validate interval.
It may be helpful to ensure this value is above zero or within a practical range to avoid extremes.

Comment on lines +100 to +101
#[arg(long, env, help = "The number of slots by which to check & advance the next sequential slots", default_value_t = DEFAULT_SLOT_BACKFILL_OFFSET)]
slot_backfill_offset: u64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Ensure a reasonable backfill offset.
Similarly, consider validating that slot_backfill_offset is non-zero and does not exceed safe bounds to avoid extended loops or excessive memory usage.

}

pub fn get_last_persisted_slot(rocks_db: Arc<SlotStorage>) -> u64 {
pub fn get_last_persisted_slot(rocks_db: &SlotStorage) -> u64 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Log potential decode errors.
Using unwrap_or_default() may silently hide data corruption. Consider logging an error before falling back to zero.

Comment on lines 113 to 120
pub fn get_first_persisted_slot(rocks_db: &SlotStorage) -> u64 {
let mut it = rocks_db.db.raw_iterator_cf(&rocks_db.db.cf_handle(RawBlock::NAME).unwrap());
it.seek_to_first();
if !it.valid() {
return 0;
}
it.key().map(|b| RawBlock::decode_key(b.to_vec()).unwrap_or_default()).unwrap_or_default()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Check for abnormal earliest slot conditions.
This logic is consistent with get_last_persisted_slot, but a warning log on decode failure might be helpful for diagnostic purposes.

Comment on lines +122 to +138
pub fn get_checkpoint(rocks_db: &SlotStorage) -> u64 {
let mut checkpoint = None;
let mut it = rocks_db
.db
.raw_iterator_cf(&rocks_db.db.cf_handle(SlotConsistencyCheckpoint::NAME).unwrap());
it.seek_to_first();
if it.valid() {
checkpoint = Some(
it.key()
.map(|b| SlotConsistencyCheckpoint::decode_key(b.to_vec()).unwrap_or_default())
.unwrap_or_default(),
)
} else {
let mut it = rocks_db.db.raw_iterator_cf(&rocks_db.db.cf_handle(RawBlock::NAME).unwrap());
it.seek_to_first();
if it.valid() {
checkpoint = Some(
it.key()
.map(|b| RawBlock::decode_key(b.to_vec()).unwrap_or_default())
.unwrap_or_default(),
);
}
}

checkpoint.unwrap_or_default()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Silent fallback for missing checkpoint.
Currently, missing or corrupted data sets the checkpoint to zero without notification. Consider logging a warning or returning an error if that’s not the intended design.

Comment on lines +257 to +254
usecase::executor::spawn(get_missed_slots(
backfill_source.clone(),
target_db.clone(),
args.clone(),
metrics_state.backfiller_metrics.clone(),
cancellation_token.child_token(),
));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider capturing the spawned task handle.
Without storing the JoinHandle, the main thread cannot detect critical failures in get_missed_slots. If that’s important, returning or awaiting the handle might help.

@armyhaylenko armyhaylenko force-pushed the feature/MTG-1070-periodic-missed-slot-fetch branch from 6c87eac to 19d207b Compare April 8, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants