Zero-Copy Binary Data Parser (C++/Python)

Speedup: 23.9x faster than standard Pandas/NumPy ingestion.
Latency: 0.1761s (Zero-Copy) vs 4.2081s (Standard) for 40M records.
Throughput: ~228 Million records per second.

Performance Benchmark

This project demonstrates high-performance systems programming by bridging C++ and Python to eliminate data-copying overhead.

Linux mmap: Maps the binary file directly into the process address space, bypassing standard I/O system calls.
Custom C++ Structs: Uses fixed-width memory layouts to achieve $O(1)$ random access to financial tick data.
NumPy Buffer Protocol: Utilizes pybind11 to share memory addresses directly with Python, allowing vectorized analysis with zero heap allocations.

Generate Data (40M Records): python generate_data.py
Compile Engine: g++ -O3 -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) fast_parser.cpp -o fast_parser.so
Run Benchmark: python benchmark.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
benchmark.py		benchmark.py
benchmark_results.png		benchmark_results.png
fast_parser.cpp		fast_parser.cpp
generate_data.py		generate_data.py