Skip to content

Conversation

Copy link

Copilot AI commented Aug 20, 2025

Summary

Fixes a critical bug in EdgeIter where property values become misaligned during segmented iteration or when crossing chunk boundaries, leading to incorrect data access, IO errors, and crashes.

Problem

The EdgeIter class maintains two state layers that can become desynchronized:

  1. Iterator state: vertex_chunk_index_ and cur_offset_
  2. Property reader state: Each AdjListPropertyArrowChunkReader has its own vertex_chunk_index_, chunk_index_, seek_offset_, and cached chunk_table_

This desynchronization manifests in several critical ways:

// Sequential iteration works correctly
for (auto it = edges->begin(); it != edges->end(); ++it) {
    std::cout << it.property<std::string>("creationDate").value() << std::endl;
}

// But segmented iteration shows misaligned properties
auto begin2 = edges->begin();
for (auto it = begin2; it != edges->end(); ++it, i++) {
    if (i <= 2000) continue;  // Jump to position 2000+
    // Property values here are WRONG - misaligned with source/destination
    std::cout << it.property<std::string>("creationDate").value() << std::endl;
}

Symptoms observed:

  • Property values out of sync between it.property<T>() and (*it).property<T>()
  • Runtime errors: Failed to open ... part10/chunk0 when accessing non-existent chunks
  • AddressSanitizer errors during property access
  • Inconsistent results between different iteration patterns

Root Cause

Three key methods had synchronization issues:

  1. property() method: Called reader.seek(cur_offset_) without ensuring the reader was positioned at the correct vertex chunk
  2. operator*() method: Similar issue - didn't synchronize property readers to the current chunk before seeking
  3. operator++() boundary crossing: Used reader.next_chunk() which could fail and leave readers in stale states

Solution

Ensure property readers are always synchronized with the iterator's vertex_chunk_index_ before any seek operation:

// Before (buggy)
for (auto& reader : property_readers_) {
    reader.seek(cur_offset_);  // May be on wrong chunk!
}

// After (fixed)  
for (auto& reader : property_readers_) {
    reader.seek_chunk_index(vertex_chunk_index_);  // Ensure correct chunk
    reader.seek(cur_offset_);
}

Key changes:

  • Added reader.seek_chunk_index(vertex_chunk_index_) before seeking in property() method
  • Added chunk synchronization in operator*() before creating Edge objects
  • Replaced unreliable reader.next_chunk() with robust reader.seek_chunk_index() in operator++() error handling

Testing

Added comprehensive test case in test_graph.cc that validates:

  • Consistency between different property access patterns
  • Segmented iteration produces identical results to sequential iteration
  • Cross-chunk boundary handling without crashes
  • Property values remain correctly aligned with source/destination IDs

The test specifically covers the reported bug scenario of jumping to position 2000+ and verifying property alignment.

Impact

This fix ensures that:

  • All property access methods return consistent values regardless of iteration pattern
  • No runtime crashes when crossing chunk boundaries
  • Segmented traversal, batch scanning, and sequential iteration all produce identical results
  • Edge property queries work reliably in high-performance graph analytics workflows

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • apache.jfrog.io
    • Triggering command: wget -c REDACTED -P /tmp/ (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: Thespica <119573640+Thespica@users.noreply.github.com>
Copilot AI changed the title [WIP] 尽全力复现这个问题, 记录在文档中. 并修复代码, 把类似的情况放入测试中, 并确保修复后的代码可以通过这种情况的测试: bug(c++): EdgeIter causes property misalignment, IO errors, and crashes when iterating ac... fix(c++): EdgeIter property misalignment causing IO errors and crashes during chunk traversal Aug 20, 2025
Copilot AI requested a review from Thespica August 20, 2025 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants