Skip to content

jopamo/blake3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BLAKE3 C Library

Language Build System License Platform Performance

A modern, high-performance fork of the official BLAKE3 C implementation.

Designed for Linux workstations and high-throughput servers, this project integrates a robust Meson build system, comprehensive SIMD dispatch, and a specialized parallel hashing API not found in the standard C implementation.


πŸš€ Key Features

  • ⚑ Modern Build System Fully integrated with Meson and Ninja for fast, reliable, and portable builds.

  • 🏎️ Aggressive Optimization Runtime SIMD dispatch for AVX-512, AVX2, SSE4.1, SSE2, and ARM NEON.

  • 🧡 Parallel Hashing API A dedicated blake3_parallel.h API for high-throughput, multi-threaded hashing of in-memory buffers.

  • 🐧 Linux Optimized b3sum uses Linux-specific I/O primitives (preadv2, RWF_NOWAIT) to minimize syscall overhead.

  • πŸ›‘οΈ Robust Testing Integrated fuzzing, fault injection, and regression coverage.


πŸ“Š Performance Highlights

This fork prioritizes real-world performance on Linux, especially for workloads common in build systems, package managers, artifact verification, and CI pipelines.

Benchmarks comparing this implementation against the official Rust b3sum show consistent wins where fixed overhead and execution efficiency matter most.

Reproducibility

Benchmarks are generated by an automated harness that:

  • verifies correctness before measuring performance
  • uses hyperfine with fixed warmups and run counts
  • records CPU topology, kernel version, git commit, and timestamp
  • runs both implementations under identical conditions

Full methodology and raw results are documented in BENCHMARK.md


πŸ› οΈ Building & Installation

Prerequisites

  • Compiler: GCC or Clang (Clang recommended for sanitizers)
  • Build System: Meson (>=1.2.0) and Ninja

Quick Start

meson setup build -Dbuildtype=release
ninja -C build
ninja -C build test
sudo ninja -C build install   # optional

Debugging & Sanitizers

CC=clang meson setup build-san \
  -Ddebug_sanitize=true \
  -Dbuildtype=debugoptimized

ninja -C build-san test

πŸ“¦ Usage

Command Line Tool (b3sum)

./build/b3sum large_file.iso
./build/b3sum --check checksums.txt

C Library API

Standard API (blake3.h)

Use this for incremental or streaming hashing.

#include "blake3.h"

blake3_hasher hasher;
blake3_hasher_init(&hasher);

blake3_hasher_update(&hasher, data, len);

uint8_t out[BLAKE3_OUT_LEN];
blake3_hasher_finalize(&hasher, out, BLAKE3_OUT_LEN);

🧡 Parallel Hashing API (blake3_parallel.h)

This API is designed for one-shot hashing of large in-memory buffers where parallelism matters.

It avoids per-chunk state machines and minimizes overhead by hashing the buffer directly across multiple threads.

When to use this API

Use blake3_parallel.h when:

  • you already have a contiguous buffer in memory
  • the buffer is large (MBs or more)
  • you want maximum throughput
  • incremental streaming is not required

For small or streaming inputs, the standard API is usually better.


Basic one-shot parallel hash (unkeyed)

#include "blake3_parallel.h"

uint8_t out[32];
b3p_ctx_t *ctx = b3p_create(NULL);  // use default configuration

// Unkeyed hash: pass a zero key and flags = 0
uint8_t zero_key[BLAKE3_KEY_LEN] = {0};

b3p_hash_one_shot(
    ctx,
    input_buf,
    input_len,
    zero_key,
    0,
    B3P_METHOD_AUTO,
    out,
    sizeof(out)
);

b3p_destroy(ctx);

Forcing parallelism and tuning

b3p_config_t cfg = b3p_config_default();
cfg.nthreads = 8;
cfg.min_parallel_bytes = 1 << 20; // 1 MiB

b3p_ctx_t *ctx = b3p_create(&cfg);

This is useful when integrating into systems with known CPU topology.


Keyed hashing

uint8_t key[BLAKE3_KEY_LEN] = { /* 32-byte key */ };
uint8_t out[32];

b3p_hash_one_shot(
    ctx,
    input,
    len,
    key,
    BLAKE3_KEYED_HASH,
    B3P_METHOD_AUTO,
    out,
    32
);

Extendable output (XOF / seek)

uint8_t out[64];

b3p_hash_one_shot_seek(
    ctx,
    input,
    len,
    zero_key,
    0,
    B3P_METHOD_AUTO,
    64,   // seek offset
    out,
    sizeof(out)
);

Serial fallback (no threads)

b3p_hash_buffer_serial(
    input,
    len,
    zero_key,
    0,
    out,
    32
);

This is useful for environments where thread creation is undesirable.


Thread-local cleanup

If you create and destroy many contexts in long-running processes:

b3p_free_tls_resources();

🧩 Architecture

Directory Description
src/ Core logic, SIMD backends, runtime dispatch
tests/ Determinism, threading, and edge-case tests
meson.build Build configuration and feature detection

See HACKING.md for internal design details.


πŸ“„ License & Credits

Dual-licensed under CC0 1.0 Universal and Apache License 2.0, matching upstream.

Original BLAKE3 Designers

  • Jack O'Connor
  • Jean-Philippe Aumasson
  • Samuel Neves
  • Zooko Wilcox

Contributors welcome.

About

The C implementation of BLAKE3 (and b3sum)

Resources

Stars

Watchers

Forks

Contributors 62