Skip to content

akshaya1255/zero-copy-binary-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-Copy Binary Data Parser (C++/Python)

Performance Benchmark

  • Speedup: 23.9x faster than standard Pandas/NumPy ingestion.
  • Latency: 0.1761s (Zero-Copy) vs 4.2081s (Standard) for 40M records.
  • Throughput: ~228 Million records per second.

Performance Benchmark

Technical Architecture

This project demonstrates high-performance systems programming by bridging C++ and Python to eliminate data-copying overhead.

  1. Linux mmap: Maps the binary file directly into the process address space, bypassing standard I/O system calls.
  2. Custom C++ Structs: Uses fixed-width memory layouts to achieve $O(1)$ random access to financial tick data.
  3. NumPy Buffer Protocol: Utilizes pybind11 to share memory addresses directly with Python, allowing vectorized analysis with zero heap allocations.

How to Reproduce

  1. Generate Data (40M Records): python generate_data.py
  2. Compile Engine: g++ -O3 -shared -std=c++11 -fPIC $(python3 -m pybind11 --includes) fast_parser.cpp -o fast_parser.so
  3. Run Benchmark: python benchmark.py

About

High-performance C++/Python binary data ingestion engine using Linux mmap and Zero-Copy techniques.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors