This project benchmarks pandas vs. polars ingestion, computes rolling analytics, explores threading vs. multiprocessing, and aggregates hierarchical portfolio metrics for large financial time-series data. The pipeline produces comparative performance summaries and charts while ensuring correctness through automated tests.
| File | Description |
|---|---|
data_loader.py |
Loads CSV market data with profiling for pandas/polars. |
metrics.py |
Rolling moving average, volatility, and Sharpe ratio utilities. |
parallel.py |
Threaded and process-based execution wrappers with psutil profiling. |
portfolio.py |
Sequential and multiprocessing portfolio aggregation logic. |
reporting.py |
Assembles benchmark summary tables and matplotlib visualisations. |
main.py |
Command-line entry point orchestrating the full workflow. |
portfolio_structure-1.json |
Sample nested portfolio hierarchy. |
reports/ |
Generated charts after running main.py. |
tests/ |
Pytest suite covering rolling metrics, parallelism, and portfolio aggregation. |
performance_report.md |
Narrative summary of measured results. |
- Python 3.9+
- Required packages:
pandas,numpy,psutil,matplotlib,pytest - Optional (enables polars comparisons):
polars
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt # (see below for suggested list)If you do not maintain a requirements file yet, install manually:
pip install pandas numpy psutil matplotlib pytest
# Optional:
pip install polarspython main.py --data market_data-1.csv --portfolio portfolio_structure-1.json --window 20 --report-dir reports- Outputs a JSON summary to stdout.
- Saves comparison charts inside
reports/. - Requires
polarsonly if you want to benchmark the polars backend; otherwise it will be skipped gracefully.
pytestThe suite validates rolling metric values, parity between threading/multiprocessing and the sequential baseline, and portfolio aggregation invariants.