Skip to content

Soundness Bug in try_binary when Array is implemented incorrectly in external crate #9106

@shinmao

Description

@shinmao

Bug Description
The root cause is the missing invariant validation between Array::len() and Array::logical_nulls().len() in try_binary, leading to OOB writes when processing arrays with inconsistent metadata. The try_binary function performs unchecked memory writes based on the length reported by logical_nulls(), but only allocates buffer space based on Array::len(). Since Array is not an unsafe trait, user implementations can violate the implicit assumption that these two values must match.

let len = a.len();
if a.null_count() == 0 && b.null_count() == 0 {
try_binary_no_nulls(len, a, b, op)
} else {
let nulls =
NullBuffer::union(a.logical_nulls().as_ref(), b.logical_nulls().as_ref()).unwrap();
let mut buffer = BufferBuilder::<O::Native>::new(len);
buffer.append_n_zeroed(len);
let slice = buffer.as_slice_mut();
nulls.try_for_each_valid_idx(|idx| {
unsafe {
*slice.get_unchecked_mut(idx) = op(a.value_unchecked(idx), b.value_unchecked(idx))?
};
Ok::<_, ArrowError>(())
})?;
let values = buffer.finish().into();
Ok(PrimitiveArray::new(values, Some(nulls)))

Array is not an unsafe trait, so implementors have no safety contract to uphold. The implicit assumption is violated where logical_nulls().len() == len(), but this is not documented as a requirement, not enforced by the type system, and also not validated at runtime. Therefore, the attack can forge len() = 1 and logical_nulls().len() = 2, which causes 1-element buffer allocation, but 2-iteration loop , OOB write at index 1

To Reproduce

use arrow_arith::arity::try_binary;
use arrow_array::{Array, ArrayAccessor, Int32Array};
use arrow_array::types::Int32Type;
use arrow_buffer::NullBuffer;
use arrow_data::ArrayData;
use std::sync::Arc;

#[derive(Debug, Clone)]
struct Evil {
    inner: Int32Array, fake_nulls: NullBuffer,
}

impl Array for Evil {
    fn as_any(&self) -> &dyn std::any::Any { self }
    fn to_data(&self) -> ArrayData { self.inner.to_data() }
    fn into_data(self) -> ArrayData { self.inner.into_data() }
    fn data_type(&self) -> &arrow_schema::DataType { self.inner.data_type() }
    fn slice(&self, o: usize, l: usize) -> Arc<dyn Array> {
        Arc::new(Evil {
            inner: self.inner. slice(o, l),
            fake_nulls: self.fake_nulls.slice(o, l),
        })
    }
    fn len(&self) -> usize { 1 }
    fn is_empty(&self) -> bool { false }
    fn offset(&self) -> usize { 0 }
    fn nulls(&self) -> Option<&NullBuffer> { Some(&self.fake_nulls) }
    fn logical_nulls(&self) -> Option<NullBuffer> {
        Some(NullBuffer::new_valid(2))
    }
    fn get_buffer_memory_size(&self) -> usize { 0 }
    fn get_array_memory_size(&self) -> usize { 0 }
}

impl ArrayAccessor for Evil {
    type Item = i32;
    fn value(&self, i: usize) -> i32 { self.inner.value(i) }
    unsafe fn value_unchecked(&self, i: usize) -> i32 {
        self.inner.value_unchecked(i)
    }
}

fn main() {
    let inner = Int32Array::from(vec![42, 100]);
    let fake_nulls = NullBuffer::from(vec![false]);  // [null]
    let a = Evil { inner:  inner. clone(), fake_nulls: fake_nulls.clone() };
    let b = Evil { inner, fake_nulls };
    let result = try_binary::<_, _, _, Int32Type>(a, b, |x, y| {
        println! ("  op({}, {})", x, y);
        Ok(x + y)
    });
    match result {
        Ok(arr) => println!("\n Result: {:?}", arr),
        Err(e) => println!("\n Error: {}", e),
    }
}

with cargo run,

thread 'main' (3689217) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-arith-57.1.0/src/arity.rs:285:24:
unsafe precondition(s) violated: slice::get_unchecked_mut requires that the index is within the slice

This indicates a bug in the program. This Undefined Behavior check is optional, and cannot be relied on for safety.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
Aborted (core dumped)

Additional Context
Relevant PR: #9092

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions