Skip to content

3433: Symbol.cc and FileStream.cc: Fix -Wconversion and -Wuseless-cast on 32-bit builds#12

Open
martin-augment wants to merge 1 commit intomainfrom
pr-3433-2025-10-23-10-28-38
Open

3433: Symbol.cc and FileStream.cc: Fix -Wconversion and -Wuseless-cast on 32-bit builds#12
martin-augment wants to merge 1 commit intomainfrom
pr-3433-2025-10-23-10-28-38

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3433: To review by AI

…2-bit builds

Symbol.cc:

it - rs.begin() returns a value of type std::ptrdiff_t.
On 32-bit systems, this is typically int, so casting it to int is redundant
and triggers -Wuseless-cast.

FileStream.cc:

size_t is 32-bit on 32-bit systems, while position is int64_t.
Casting int64_t to size_t can lead to truncation, causing -Wconversion errors.
The fix ensures that the offset and position are checked to be non-negative before conversion.

Fix:

lib32-avro-c++/1.12/sources/avro-c++-1.12/lang/c++/impl/parsing/Symbol.cc:91:27:
error: useless cast to type 'int' [-Werror=useless-cast]
   91 |             adj.push_back(static_cast<int>(it - rs.begin()));
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

lib32-avro-c++/1.12/sources/avro-c++-1.12/lang/c++/impl/FileStream.cc:208:41:
error: conversion from 'int64_t' {aka 'long long int'} to 'size_t'
{aka 'unsigned int'} may change value [-Werror=conversion]
  208 |         in_->seek(position - byteCount_ - available_);
      |                   ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
lib32-avro-c++/1.12/sources/avro-c++-1.12/lang/c++/impl/FileStream.cc:209:22:
error: conversion from 'int64_t' {aka 'long long int'} to 'size_t'
{aka 'unsigned int'} may change value [-Werror=conversion]
  209 |         byteCount_ = position;
      |                      ^~~~~~~~
cc1plus: all warnings being treated as errors

These changes fix build failures on 32-bit systems and ensure safe type conversions.

Signed-off-by: Alper Ak <alperyasinak1@gmail.com>
Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

if (offset < 0) {
throw Exception("Negative offset in seek");
}
in_->seek(static_cast<size_t>(offset));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 32-bit builds, casting 64-bit offset to size_t can truncate for large values; consider validating that offset fits in size_t before calling in_->seek to avoid incorrect seek.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:prevented an undesired truncation of int64_t to int32_t on 32bit systems.

if (position < 0) {
throw Exception("Negative position not allowed");
}
byteCount_ = static_cast<size_t>(position);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, casting position (int64_t) to size_t for byteCount_ can truncate on 32-bit; consider checking the range before assignment. (Related to the cast above.)

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:prevented an undesired truncation of int64_t to int32_t on 32bit systems.

} else {
adj.push_back(static_cast<int>(it - rs.begin()));
auto index = it - rs.begin();
if constexpr (std::is_same_v<decltype(index), int>) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::is_same_v requires <type_traits>, but this TU doesn’t include it; consider adding the header to avoid relying on transitive includes and potential build failures.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:It seems <type_traits> is included transitively by some other header file but it is always better to be explicit !

@claude
Copy link
Copy Markdown

claude bot commented Oct 23, 2025

PR Review: Symbol.cc and FileStream.cc - Fix -Wconversion and -Wuseless-cast on 32-bit builds

Summary

This PR addresses compiler warnings on 32-bit systems by adding proper type conversions and compile-time type checking. The changes target two specific build failures related to -Wuseless-cast and -Wconversion warnings.


Code Quality & Best Practices

FileStream.cc Changes ✅ Generally Good, ⚠️ Minor Concerns

Positives:

  • Proper validation for negative values before casting
  • Explicit exception handling with descriptive error messages
  • Type-safe conversions with explicit static_cast<int64_t>() to prevent undefined behavior

Concerns:

  1. Potential Integer Overflow (lang/c++/impl/FileStream.cc:208-216)

    • The code casts int64_t position to size_t without checking if position exceeds SIZE_MAX
    • On 32-bit systems, size_t is 32 bits while int64_t is 64 bits
    • If position > 4GB on a 32-bit system, the cast will truncate the value
    if (position < 0) {
        throw Exception("Negative position not allowed");
    }
    // Missing: if (position > static_cast<int64_t>(SIZE_MAX)) check
    byteCount_ = static_cast<size_t>(position);

    Recommendation: Add overflow check:

    if (position < 0) {
        throw Exception("Negative position not allowed");
    }
    if (static_cast<uint64_t>(position) > SIZE_MAX) {
        throw Exception("Position exceeds maximum size_t value");
    }
    byteCount_ = static_cast<size_t>(position);
  2. Member Variable Type Mismatch

    • byteCount_ is declared as size_t but is being assigned from int64_t values
    • Consider whether byteCount_ should be int64_t instead for consistency with the seek() interface
    • This would avoid lossy conversions on 32-bit systems

Symbol.cc Changes ⚠️ Complexity Concern

Positives:

  • Uses if constexpr for compile-time type checking
  • Avoids unnecessary casts on 32-bit systems where the type is already int

Concerns:

  1. Overly Complex Solution (lang/c++/impl/parsing/Symbol.cc:91-96)

    • The if constexpr approach adds complexity for a simple cast
    • The conditional logic checks if decltype(index) is int, but on 32-bit systems std::ptrdiff_t is typically int, while on 64-bit it's long or long long
    • This approach works but is harder to maintain

    Alternative Recommendation: Use a simpler approach that's explicit and safe:

    auto index = it - rs.begin();
    adj.push_back(static_cast<int>(index));

    The original warning -Wuseless-cast was flagging that on 32-bit, the cast is "useless" because it's already an int. However, this warning is pedantic. Consider:

    • Option A: Disable -Wuseless-cast for this specific line with #pragma GCC diagnostic ignored
    • Option B: Accept the simpler cast and configure the build to not treat this specific warning as an error
    • Option C: Keep the current if constexpr approach if code consistency is more valued than simplicity
  2. Range Check Missing

    • Vector iterator difference on 64-bit systems returns ptrdiff_t (64-bit signed)
    • Casting to int (32-bit) could overflow if the vector has > 2^31 elements
    • While enum schemas with 2 billion symbols are unrealistic, defensive programming suggests adding a check:
    auto index = it - rs.begin();
    if (index > std::numeric_limits<int>::max()) {
        throw Exception("Enum index too large");
    }
    adj.push_back(static_cast<int>(index));

Potential Bugs

  1. FileStream.cc:208 - Integer overflow not checked (as detailed above)
  2. Symbol.cc:95 - Potential truncation when casting ptrdiff_t to int on 64-bit systems with large vectors (theoretical edge case)

Performance Considerations

No performance impact:

  • The changes involve compile-time type checking (if constexpr) and runtime casts that are essentially no-ops at the CPU level
  • Exception checks are only in error paths, not hot paths
  • The seek() operation is already not a performance-critical path (I/O bound)

Security Concerns

⚠️ Medium Priority:

  1. Integer Truncation in seek() (FileStream.cc:216)

    • If unchecked, could lead to seeking to incorrect positions in files
    • Could potentially cause data corruption if an application attempts to seek beyond 4GB on 32-bit systems
    • Mitigation: Add the overflow check suggested above
  2. Buffer Overflow Risk (Theoretical)

    • If byteCount_ wraps due to truncation, buffer management logic could be affected
    • This could impact the BufferCopyInInputStream class's internal state consistency
    • The available_ and byteCount_ calculations assume correct values

Test Coverage

Major Concern: No Tests Added

The PR does not include:

  • Unit tests for the new exception paths
  • Tests verifying behavior at type boundaries (SIZE_MAX, INT_MAX)
  • Tests for negative position/offset handling
  • 32-bit specific test cases

Recommendations:

  1. Add tests in lang/c++/test/StreamTests.cc:

    • Test seek() with negative positions (should throw)
    • Test seek() with values near SIZE_MAX on 32-bit systems
    • Test the full range of valid positions
  2. Add enum symbol adjustment tests:

    • Test with large enum symbol sets (if practical)
    • Verify correct index mapping

Example test structure:

void testSeekNegativePosition() {
    // Test that negative positions throw exceptions
    FileRemover fr("test_seek.bin");
    {
        auto os = fileOutputStream("test_seek.bin");
        // Write some data
    }
    auto is = fileInputStream("test_seek.bin");
    auto* seekable = dynamic_cast<SeekableInputStream*>(is.get());
    BOOST_CHECK_THROW(seekable->seek(-1), Exception);
}

Additional Recommendations

  1. Build Verification: Ensure CI includes 32-bit builds to catch these issues automatically
  2. Documentation: Add comments explaining the 32-bit vs 64-bit type differences
  3. Consider Member Type Changes: Evaluate changing byteCount_ from size_t to int64_t for type consistency
  4. Code Review: Have another developer verify the bounds checking logic

Overall Assessment

Verdict: ⚠️ Approve with Changes Recommended

The PR successfully addresses the immediate compiler warnings, but has gaps in:

  • Overflow checking (integer truncation risk)
  • Test coverage (no new tests)
  • Code complexity (Symbol.cc could be simpler)

Suggested Actions Before Merge:

  1. Add overflow check for position > SIZE_MAX in FileStream.cc:216
  2. Add unit tests for new exception paths
  3. Consider simplifying Symbol.cc approach or add explanatory comments

The changes are functional and fix the build issues, but would benefit from the additional safety checks and test coverage to ensure robustness on 32-bit systems.


Review generated with Claude Code

@github-actions github-actions bot added the C++ label Oct 23, 2025
@github-actions
Copy link
Copy Markdown

Findings

  • Blocking lang/c++/impl/parsing/Symbol.cc:92 introduces std::is_same_v but the TU does not include <type_traits>. Neither Symbol.cc nor Symbol.hh pull it in, so this no longer compiles—please add the missing header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants