Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 1.08 KB

File metadata and controls

20 lines (15 loc) · 1.08 KB

Zero-Copy Binary Data Parser (C++/Python)

Performance Benchmark

  • Speedup: 23.9x faster than standard Pandas/NumPy ingestion.
  • Latency: 0.1761s (Zero-Copy) vs 4.2081s (Standard) for 40M records.
  • Throughput: ~228 Million records per second.

Performance Benchmark

Technical Architecture

This project demonstrates high-performance systems programming by bridging C++ and Python to eliminate data-copying overhead.

  1. Linux mmap: Maps the binary file directly into the process address space, bypassing standard I/O system calls.
  2. Custom C++ Structs: Uses fixed-width memory layouts to achieve $O(1)$ random access to financial tick data.
  3. NumPy Buffer Protocol: Utilizes pybind11 to share memory addresses directly with Python, allowing vectorized analysis with zero heap allocations.

How to Reproduce

  1. Generate Data (40M Records): python generate_data.py
  2. Compile Engine: g++ -O3 -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) fast_parser.cpp -o fast_parser.so
  3. Run Benchmark: python benchmark.py