Skip to content

Commit 19002ab

Browse files
committed
fix: correct DictEncoder::estimated_memory_size
The returned value should estimate the actual memory usage, but instead it used the evaluation of the encoded size of the dictionary data, and bypassed the hash table memory usage added by the Interner. The implementation of Storage::estimated_memory_size for the unique key storage was not correct as well, but it was unused. Correct both problems.
1 parent 88b7fca commit 19002ab

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

parquet/src/encodings/encoding/dict_encoder.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ impl<T: DataType> Storage for KeyStorage<T> {
6464
}
6565

6666
fn estimated_memory_size(&self) -> usize {
67-
self.size_in_bytes + self.uniques.capacity() * std::mem::size_of::<T::T>()
67+
self.uniques.capacity() * std::mem::size_of::<T::T>()
6868
}
6969
}
7070

@@ -183,6 +183,6 @@ impl<T: DataType> Encoder<T> for DictEncoder<T> {
183183
///
184184
/// For this encoder, the indices are unencoded bytes (refer to [`Self::write_indices`]).
185185
fn estimated_memory_size(&self) -> usize {
186-
self.interner.storage().size_in_bytes + self.indices.len() * std::mem::size_of::<usize>()
186+
self.interner.estimated_memory_size() + self.indices.len() * std::mem::size_of::<usize>()
187187
}
188188
}

0 commit comments

Comments
 (0)