perf: idempotent_dir for dataset generation#7493
perf: idempotent_dir for dataset generation#7493joseph-isaacs wants to merge 3 commits intodevelopfrom
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.952x ➖ datafusion / vortex-file-compressed (0.952x ➖, 0↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.934x ➖, 2↑ 1↓)
datafusion / vortex-compact (0.975x ➖, 0↑ 0↓)
datafusion / parquet (0.967x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.017x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.033x ➖, 0↑ 2↓)
duckdb / parquet (0.949x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.051x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.019x ➖, 0↑ 0↓)
datafusion / parquet (1.033x ➖, 0↑ 1↓)
datafusion / arrow (1.039x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.012x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.021x ➖, 0↑ 0↓)
duckdb / parquet (1.001x ➖, 0↑ 0↓)
duckdb / duckdb (1.009x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.105x ❌, 0↑ 54↓)
datafusion / vortex-compact (1.098x ➖, 0↑ 44↓)
datafusion / parquet (1.102x ❌, 0↑ 46↓)
duckdb / vortex-file-compressed (1.084x ➖, 0↑ 30↓)
duckdb / vortex-compact (1.060x ➖, 2↑ 20↓)
duckdb / parquet (1.060x ➖, 0↑ 15↓)
duckdb / duckdb (1.067x ➖, 0↑ 25↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.954x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.830x ➖, 1↑ 0↓)
datafusion / parquet (0.976x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.887x ➖, 0↑ 0↓)
duckdb / parquet (0.973x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.026x ➖, 0↑ 3↓)
datafusion / parquet (0.991x ➖, 0↑ 0↓)
datafusion / arrow (0.989x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.148x ❌, 0↑ 20↓)
duckdb / vortex-compact (1.125x ❌, 0↑ 18↓)
duckdb / parquet (1.069x ➖, 0↑ 4↓)
duckdb / duckdb (1.017x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.997x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.990x ➖, 0↑ 0↓)
duckdb / parquet (0.981x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.979x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.911x ➖, 0↑ 0↓)
datafusion / parquet (1.058x ➖, 1↑ 3↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.992x ➖, 0↑ 0↓)
duckdb / parquet (0.969x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.003x ➖, 1↑ 2↓)
datafusion / parquet (0.989x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.970x ➖, 7↑ 0↓)
duckdb / parquet (0.998x ➖, 0↑ 1↓)
duckdb / duckdb (0.975x ➖, 2↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.853x ✅ unknown / unknown (0.951x ➖, 8↑ 1↓)
|
Benchmarks: CompressionVortex (geomean): 1.004x ➖ unknown / unknown (1.007x ➖, 1↑ 4↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.961x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.907x ➖, 2↑ 0↓)
datafusion / parquet (0.819x ➖, 6↑ 0↓)
duckdb / vortex-file-compressed (1.017x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.021x ➖, 0↑ 0↓)
duckdb / parquet (1.048x ➖, 0↑ 0↓)
Full attributed analysis
|
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
21dfaa0 to
95850b6
Compare
Add a `dir: &Path` parameter to `download_many`. On entry it skips all downloads if `dir/.success` already exists; on success it writes that marker so subsequent runs skip the whole batch. Call sites updated: clickbench partitioned, public_bi bzips, vector_dataset train shards. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
18703fd to
762e4c6
Compare
Have a
.successfile for dataset generate to speed up repeated benchmark runs