I computed the file sizes for scale factor 1 as follows:
import pandas as pd
def get_mem(df):
# Get memory usage in bytes
memory_usage = df.memory_usage(deep=True).sum()
print(f"Memory size: {memory_usage:,} bytes")
print(f"Memory size: {memory_usage / (1024**2):.2f} MB")
df = pd.read_parquet('building.parquet')
get_mem(df)
Here are the file sizes for SF1:
| Table |
Size (sf1) |
| building |
4.06 MB |
| customer |
9.30 MB |
| driver |
0.15 MB |
| trip |
3547.67 MB |
| vehicle |
0.02 MB |
| zone |
1388.61 MB |
Perhaps we can expose a helper to fetch the uncompressed data size and add this info to the documentation?