The Speed of Rust. The Simplicity of Python.
PardoX is a high-performance DataFrame engine for modern ETL and analytics. A Rust core powers SDKs in Python, Node.js, and PHP, with native database I/O, an ultra-fast binary format, and out-of-core processing for datasets larger than RAM.
v0.3.2 is now available. PRDX Streaming to PostgreSQL (150M rows validated), GroupBy, Window Functions, String & Date ops, Lazy Pipeline, SQL over DataFrames, Encryption, Data Contracts, Time Travel, Arrow Flight, Linear Algebra, REST Connector, Cloud Storage — 29 feature gaps total.
| Feature | Status |
|---|---|
| PRDX Streaming to PostgreSQL | Stream .prdx → PostgreSQL via COPY FROM STDIN with O(block) RAM. Validated: 150M rows / 3.8 GB in ~490s at ~306k rows/s |
| GroupBy Aggregation (Gap 1) | df.groupby(col, {col: agg}) — sum, mean, count, min, max, std — Python, JS, PHP |
| String & Date Operations (Gap 2) | str_upper, str_lower, str_contains, date_extract, date_diff, date_add — all SDKs |
| Decimal Type (Gap 3) | Native Decimal128 column type with configurable precision and scale |
| Window Functions (Gap 4) | row_number, rank, lag, lead, rolling_mean — all SDKs |
| Lazy Pipeline (Gap 5) | scan_csv().select().filter().limit().collect() — all SDKs |
| SQL over DataFrames (Gap 14) | df.sql("SELECT ... FROM df ...") — all SDKs |
| Out-of-Core Processing (Gap 11) | chunked_groupby, external_sort, spill_to_disk — handles datasets > RAM |
| Streaming GroupBy on .prdx (Gap 13) | prdx_groupby() — O(groups) memory on any file size |
| Encryption (Gap 18) | write_prdx_encrypted / read_prdx_encrypted |
| Data Contracts (Gap 19) | df.validate_contract(schema_json) — row-level validation |
| Time Travel (Gap 20) | version_write / version_read / version_list — snapshot history |
| Arrow Flight (Gap 21) | pardox_flight_start / pardox_flight_read — high-throughput Arrow transport |
| Linear Algebra (Gap 28) | cosine_sim, l2_normalize, matmul, pca |
| REST Connector (Gap 29) | read_rest(url, method, headers_json) → DataFrame |
| Cloud Storage (Gap 15) | read_cloud_csv from S3, GCS, Azure |
| 29 Gaps Total | All 29 feature gaps implemented in the Rust core across Python, JS, PHP |
- Zero-Copy Architecture: Rust HyperBlock buffers with no intermediate Python/JS/PHP objects.
- SIMD + Multithreading: AVX2/NEON vectorized ops for 5x–20x speedups.
- Native Database I/O: PostgreSQL, MySQL, SQL Server, MongoDB — no
psycopg2, nopymysql. .prdxformat: ~4.6 GB/s read throughput — faster than Parquet for repeated workloads.- GPU Sort:
sort_values(gpu=True)— WebGPU Bitonic sort with CPU fallback. - ML Ready: Zero-copy NumPy bridge via
__array__protocol. - Multi-SDK: One Rust core, identical API in Python, Node.js, and PHP.
pip install pardoxnpm i @pardox/pardoxcomposer require betoalien/pardox-phpimport pardox as px
# Load 100k rows — parallel Rust CSV parser
df = px.read_csv("sales.csv")
print(f"{df.shape[0]:,} rows × {df.shape[1]} columns")
# GroupBy — pure Rust
grouped = df.groupby("state", {"revenue": "sum", "qty": "count"})
# Stream 150M rows to PostgreSQL with O(block) RAM
rows = px.write_sql_prdx(
"sales_150m.prdx",
"postgresql://user:pass@localhost:5432/db",
"sales", mode="append", conflict_cols=[], batch_rows=1_000_000
)
# Out-of-core: GroupBy on .prdx without loading all rows into RAM
result = px.prdx_groupby("sales_150m.prdx", ["region"], {"revenue": "sum"})| Operation | Baseline | PardoX v0.3.2 | Speedup |
|---|---|---|---|
| Read CSV (1 GB) | Pandas ~4.2s | ~0.8s | 5x |
| Column multiply (1M rows) | Pandas ~0.15s | ~0.02s | 7.5x |
| PostgreSQL write 50k rows | psycopg2 ~18s | ~0.6s (COPY) | 30x |
| MySQL write 50k rows | pymysql ~22s | ~3s (batch INSERT) | 7x |
| PRDX -> PostgreSQL 150M rows | N/A | ~490s | 306k rows/s |
- Full docs: https://www.pardox.io
- Repository: https://github.com/betoalien/PardoX
- X (Twitter): https://x.com/pardox_io
MIT