Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'm integrating Parquet bloom filters into an async pruning pipeline and found a gap in the public API.
For current situation,
- There is a sync API:
Sbbf::read_from_column_chunk(column_meta, reader)
- There is an async method, but only on the async Arrow builder:
ParquetRecordBatchStreamBuilder::get_row_group_column_bloom_filter(...)
- The helper used internally to parse bloom filter headers is
pub(crate):
chunk_read_bloom_filter_header_and_offset (in parquet::bloom_filter)
If parquet crate only has ParquetMetaData + an AsyncFileReader, downstream applications can't read bloom filters without re‑implementing Parquet bloom header parsing.
This blocks async metadata‑only pruning libraries (like me) from using bloom filters safely and efficiently.
Describe the solution you'd like
Expose a public async bloom reader that mirrors the sync API:
pub async fn read_bloom_filter_async<R: AsyncFileReader>(
column_meta: &ColumnChunkMetaData,
reader: &mut R
) -> Result<Option<Sbbf>>;
This would:
- keep internal header parsing private
- allow async pruning without coupling to Arrow builder
- avoid duplicate parsing logic in downstream crates
- be backwards compatible (pure API addition)
Alternative
Make chunk_read_bloom_filter_header_and_offset public, but this is a low‑level parsing helper and would bake in more implementation detail.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'm integrating Parquet bloom filters into an async pruning pipeline and found a gap in the public API.
For current situation,
Sbbf::read_from_column_chunk(column_meta, reader)ParquetRecordBatchStreamBuilder::get_row_group_column_bloom_filter(...)pub(crate):chunk_read_bloom_filter_header_and_offset(inparquet::bloom_filter)If parquet crate only has
ParquetMetaData+ anAsyncFileReader, downstream applications can't read bloom filters without re‑implementing Parquet bloom header parsing.This blocks async metadata‑only pruning libraries (like me) from using bloom filters safely and efficiently.
Describe the solution you'd like
Expose a public async bloom reader that mirrors the sync API:
This would:
Alternative
Make
chunk_read_bloom_filter_header_and_offsetpublic, but this is a low‑level parsing helper and would bake in more implementation detail.