Skip to content

Conversation

@martinffx
Copy link
Owner

@martinffx martinffx commented Oct 15, 2025

Feature Overview

Implements core Raft consensus components for Seshat. This PR adds the foundational pieces needed for distributed consensus but does not wire them into a working cluster yet.

What Was Built

  1. MemStorage - Raft Log Storage

Implements raft::Storage trait for in-memory Raft log management:

  • Log entry storage with append/compact operations
  • Hard state persistence (term, vote, commit index)
  • Snapshot creation and restoration
  • Thread-safe with RwLock
  1. RaftNode - Consensus Coordinator

Orchestrates Raft consensus operations:

  • Ready processing loop (tick, propose, apply)
  • Proposal submission interface
  • Leader status queries
  • Hard state management
  1. StateMachine - KV Operations

Deterministic state machine for KV operations:

  • Operation application (GET, SET, DEL, EXISTS, PING)
  • Snapshot generation and restoration
  • Bincode serialization for operations
  1. gRPC Transport Layer

Custom protobuf-based transport for Raft messages:

  • Own proto definitions (not using raft-proto)
  • Bridges raft-rs (prost 0.11) with modern gRPC (tonic 0.14)
  • Server/client stubs for node-to-node communication
  1. Common Types & KV Operations

Shared foundation:

  • Error types across crates (250 lines)
  • Type-safe wrappers: NodeId, LogIndex, Term (192 lines)
  • KV operation definitions with serialization (405 lines)

martinffx and others added 25 commits October 12, 2025 17:57
Implement Phase 1 (Common Types Foundation) of Raft consensus feature:
- Add type aliases: NodeId, Term, LogIndex with comprehensive docs
- Define Error enum with thiserror for ergonomic error handling
- Add 32 passing unit tests (100% Phase 1 test coverage)
- Update task tracking with executive summary and progress metrics

Phase 1 Status: 2/2 tasks complete (100%)
Overall Progress: 2/24 tasks (8%)

Test Coverage:
- crates/common/src/types.rs: 10 tests passing
- crates/common/src/errors.rs: 20 tests passing
- All doctests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement Phase 4 Storage Layer task 1 (mem_storage_skeleton):
- Create MemStorage struct with thread-safe RwLock fields
- Add comprehensive test coverage (13 storage tests + 2 doctests)
- Switch raft-rs to prost-codec to avoid protobuf version conflicts

Implementation Details:
- MemStorage with HardState, ConfState, Vec<Entry>, Snapshot fields
- Thread-safe design (Send + Sync) using RwLock for concurrent access
- new() constructor with Default trait implementation
- Comprehensive documentation with usage examples

Dependencies:
- raft = { version = "0.7", default-features = false, features = ["prost-codec"] }
- tokio = { version = "1", features = ["full"] }

Fixes:
- Fix clippy warnings in common crate (inline format args, assign ops)
- Fix mise lint task (remove --all-features flag causing protobuf conflicts)

Test Results:
- 46 tests passing workspace-wide
- 14/14 raft crate tests passing
- 32/32 common crate tests passing
- No clippy warnings

Progress:
- Phase 4 (Storage Layer): 1/7 tasks complete (14%)
- Overall: 3/24 tasks complete (12.5%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement Phase 4 Storage Layer task 2 (mem_storage_initial_state):
- Add initial_state() method returning RaftState with HardState and ConfState
- Add helper methods set_hard_state() and set_conf_state() for testing
- 11 comprehensive tests covering defaults, mutations, thread safety, and edge cases

Implementation Details:
- initial_state() acquires read locks for efficient concurrent access
- Returns cloned data to prevent mutation leaks
- Thread-safe with multiple concurrent readers
- Follows raft-rs API conventions (raft::Result<RaftState>)

Helper Methods:
- set_hard_state(hs: HardState) - Updates storage hard state
- set_conf_state(cs: ConfState) - Updates storage conf state

Test Coverage (11 new tests):
- test_initial_state_returns_defaults - Verifies term=0, vote=0, commit=0
- test_initial_state_reflects_hard_state_changes - State updates reflected
- test_initial_state_reflects_conf_state_changes - Config updates reflected
- test_initial_state_is_thread_safe - 10 concurrent threads
- test_initial_state_returns_cloned_data - Data isolation verified
- test_initial_state_multiple_calls_are_consistent - 100 consecutive calls
- test_set_hard_state_updates_storage - Direct storage verification
- test_set_conf_state_updates_storage - Direct storage verification
- test_initial_state_with_empty_conf_state - Partial state updates
- test_initial_state_with_complex_conf_state - Joint consensus scenarios
- Edge cases for configuration changes

Fixes:
- Use struct initialization syntax to satisfy clippy::field_reassign_with_default
- All 24 tests passing (13 original + 11 new)
- No clippy warnings

Progress:
- Phase 4 (Storage Layer): 2/7 tasks complete (29%)
- Overall: 4/24 tasks complete (16.7%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add MemStorage::entries() method with comprehensive range query support:
- Range queries [low, high) with proper bounds checking
- Size-limited queries using prost::Message::encoded_len()
- Error handling for compacted (StorageError::Compacted) and unavailable entries
- Helper methods: first_index(), last_index(), append()
- Guarantees at least one entry returned even if exceeds max_size

Test coverage (12 new tests):
- Empty and normal range queries
- Size limits and partial results
- Boundary conditions and error cases
- Thread safety with concurrent access

Dependencies: Added prost = "0.11" for message size calculation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add MemStorage::term() method with comprehensive term lookup support:
- Special case: term(0) always returns 0 (Raft convention)
- Returns snapshot.metadata.term for snapshot index
- Proper bounds checking with first_index() and last_index()
- Error handling for compacted (StorageError::Compacted) entries
- Error handling for unavailable (StorageError::Unavailable) entries
- Thread-safe with RwLock read access

Test coverage (9 new tests):
- Index 0 returns 0
- Valid indices return correct terms
- Snapshot index returns snapshot term
- Compacted and unavailable error cases
- Empty storage and snapshot-only scenarios
- Thread safety with concurrent access
- Boundary conditions

Progress: 6/24 tasks (25%), Storage Layer 57% (4/7)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add 18 comprehensive tests for existing first_index() and last_index() methods:

first_index() tests (6 tests):
- Empty log returns 1
- After append returns correct index
- With snapshot returns snapshot.index + 1
- Snapshot with entries scenario
- After compaction with sparse entries
- Entries not starting at index 1

last_index() tests (6 tests):
- Empty log returns 0
- After append returns last entry index
- Snapshot only returns snapshot.index
- Snapshot with entries returns last entry
- Multiple appends update correctly
- Single entry edge case

Invariant & safety tests (6 tests):
- Verify first_index <= last_index + 1 always holds
- Boundary conditions (empty, single, multiple)
- Thread safety with concurrent access
- Consistency across multiple calls
- Large snapshot indices handling
- Multiple scenario lifecycle testing

All methods already implemented and working - this formalizes them
with comprehensive test coverage per acceptance criteria.

Progress: 7/24 tasks (29.2%), Storage Layer 71% (5/7)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add MemStorage::snapshot() method with Phase 1 simplified implementation:
- Always returns current snapshot (ignores request_index in Phase 1)
- Thread-safe with RwLock read access
- Returns cloned snapshot to prevent mutation leaks
- Comprehensive documentation with Phase 1 simplification note

Test coverage (7 new tests):
- Default snapshot on new storage
- Stored snapshot retrieval
- Phase 1 behavior (ignores request_index)
- Complex metadata (ConfState with voters/learners)
- Large data payloads (10KB)
- Clone independence validation
- Thread safety (10 threads × 100 iterations)

Implementation notes:
- Phase 1: Simple read-lock-clone-return pattern
- Future phases may return SnapshotTemporarilyUnavailable
- Validates snapshot data integrity (metadata + data)
- 1000 total concurrent reads tested

Progress: 8/24 tasks (33.3%), Storage Layer 86% (6/7)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update mise check task to:
- Format code (not just check formatting)
- Include build step
- Use cleaner depends pattern

Now runs: format → lint → build → test
Implement apply_snapshot() and wl_append_entries() to complete the
Storage Layer implementation. Both methods use proper Raft semantics:

- apply_snapshot(): Replaces storage state with snapshot, clears
  covered entries, updates hard_state and conf_state
- wl_append_entries(): Appends entries with conflict resolution
  (compares terms, truncates on mismatch)

Adds 16 comprehensive tests covering:
- Snapshot installation with state updates
- Entry appending with conflict resolution
- Thread safety with concurrent operations
- Edge cases (empty log, conflicting terms)

All 86 tests passing with zero clippy warnings.

Storage Layer (Phase 4) now 100% complete (7/7 tasks).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Corrected protobuf enum variant names (Normal, ConfChange, Noop) and
updated all format strings to use inline variable syntax for clippy
compliance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add Operation types and StateMachine implementation for Raft consensus:

Protocol Layer (operations.rs):
- Operation enum with Set/Del variants for key-value mutations
- Serialization/deserialization using bincode
- apply() method for executing operations on HashMap
- 17 tests covering all operation scenarios

State Machine (state_machine.rs):
- StateMachine struct with HashMap data and last_applied tracking
- Core methods: new(), get(), exists(), last_applied()
- apply() method with Operation deserialization and idempotency
- 19 tests covering all state machine operations
- Integration with protocol crate Operation types

Progress: 16/24 tasks complete (66.7%), Phase 5 at 67% (2/3)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement snapshot/restore functionality for log compaction:
- Add snapshot() method to serialize state machine using bincode
- Add restore() method to deserialize and replace state
- Add Serialize/Deserialize derives to StateMachine struct
- Add bincode 1.3 dependency to raft crate

Tests: 9 new unit tests + 2 doc tests covering:
- Empty snapshot creation
- Snapshot with data
- Restore from snapshot
- Roundtrip serialization
- Error handling for invalid data
- Large state (100 keys) performance
- State overwrite verification

All 147 tests passing (123 unit + 24 doc tests)

Phase 5 (State Machine) now 100% complete (3/3 tasks)
Overall progress: 70.8% (17/24 tasks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add RaftNode struct wrapping raft-rs RawNode
- Implement new() for node initialization
- Implement tick() for logical clock advancement
- Implement propose() for client command submission
- Implement handle_ready() for Raft state processing
- Add apply_committed_entries() helper method
- Add MemStorage::append() for entry persistence
- Add comprehensive test coverage (22 tests)
- Update progress: 83.3% complete (20/24 tasks)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Implement is_leader() to check if node is leader
- Implement leader_id() to get current leader ID
- Add 8 comprehensive tests for leader queries
- Complete Phase 6 (Raft Node) - 100% done
- Update progress: 87.5% complete (21/24 tasks)
- Ready for Phase 7 (Integration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Address code review feedback:
- Replace .unwrap() with .expect() for descriptive error messages
- Fix TOCTOU races in entries() and term() by acquiring locks once
- Add defensive logging in apply_committed_entries()
- Document lock poisoning philosophy for Phase 1

All 199 tests passing, zero clippy warnings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implement transport layer for Raft message communication:
- Add TransportServer/Client with gRPC (tonic 0.12, prost 0.13)
- Bridge prost 0.11 (raft-rs) ↔ 0.13 (transport) via conversion layer
- Extract KV operations to separate crate (seshat-kv)
- Rename protocol → protocol-resp as RESP placeholder
- Remove custom protobuf definitions (use raft-rs built-ins internally)

Benefits:
- Modern gRPC stack (2024/2025 versions) for transport
- No version lock on rest of service
- Clean isolation of old prost dependency

Tests: 203 passing (157 unit + 13 integration + 33 doctests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Upgrade transport layer to latest versions:
- tonic 0.12 → 0.14
- prost 0.13 → 0.14
- Use tonic-prost-build instead of tonic-build (API change)
- Add tonic-prost runtime dependency for generated code

All 203 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove placeholder add() function from lib.rs
- Add prost version bridging documentation in storage.rs
- Replace eprintln! with log::warn! for structured logging
- Document direct field access rationale in is_leader()
- Remove outdated #[allow(dead_code)] on MemStorage
- Add log dependency for proper logging infrastructure

All 156 library tests and 13 integration tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Integrate complete RESP2/3 parser and encoder from feat/resp branch:
- Full protocol support (14 data types, 487 tests passing)
- Zero-copy parsing with bytes::Bytes
- Tokio codec integration for async I/O
- Command parser for GET, SET, DEL, EXISTS, PING
- Buffer pooling for memory efficiency

Additional changes:
- Simplify CI workflow to use mise for local/CI parity
- Fix duplicate CI runs (removed push on feat/* branches)
- Remove optional raft dependency from common crate to avoid protobuf-build conflicts
- Add --all-features to mise lint task for comprehensive testing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The mise install task warns about protoc but doesn't install it.
CI needs protoc installed before building raft-proto dependency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add protoc to mise.toml tools for automatic installation.
This eliminates manual protoc installation steps and ensures version
consistency across local development and CI environments.

Changes:
- Add protoc = "28" to [tools] in mise.toml
- Remove manual apt-get protoc installation from CI workflow
- Mise action automatically installs all tools defined in mise.toml

Benefits:
- Single source of truth for tool versions
- Automatic protoc installation in CI via mise-action
- Consistent protoc version (28.3) across all environments
- Simpler CI workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@martinffx martinffx merged commit d094128 into main Oct 18, 2025
1 check passed
@martinffx martinffx deleted the feat/raft branch October 18, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants