Zero-Copy Binary Data Parser (C++/Python)

Speedup: 23.9x faster than standard Pandas/NumPy ingestion.
Latency: 0.1761s (Zero-Copy) vs 4.2081s (Standard) for 40M records.
Throughput: ~228 Million records per second.

Performance Benchmark

This project demonstrates high-performance systems programming by bridging C++ and Python to eliminate data-copying overhead.

Linux mmap: Maps the binary file directly into the process address space, bypassing standard I/O system calls.
Custom C++ Structs: Uses fixed-width memory layouts to achieve $O(1)$ random access to financial tick data.
NumPy Buffer Protocol: Utilizes pybind11 to share memory addresses directly with Python, allowing vectorized analysis with zero heap allocations.

Generate Data (40M Records): python generate_data.py
Compile Engine: g++ -O3 -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) fast_parser.cpp -o fast_parser.so
Run Benchmark: python benchmark.py