Skip to content

Releases: SpeyTech/certifiable-data

Deterministic data pipeline for safety-critical ML systems.

17 Jan 00:54

Choose a tag to compare

v1.0.0 — Initial Release

Deterministic data pipeline for safety-critical ML systems.


Highlights

This release delivers a complete deterministic data pipeline with 8/8 test suites passing (142 tests). Every data transformation produces bit-identical results across platforms, with cryptographic audit trails for certification evidence.


Core Modules

Module Description Tests
DVM Primitives Q16.16 fixed-point arithmetic with fault detection
PRNG Counter-based deterministic pseudo-random generation
Feistel Shuffle Cycle-walking bijection for any dataset size
Normalization Q16.16 standardization with (x - mean) * inv_std
Augmentation Deterministic flip, crop, noise transformations
Batch Construction Static allocation with Merkle commitment
Merkle Chain SHA256 provenance trail per epoch
Bit Identity Cross-platform reproducibility verification

Total: 142 tests passing


Key Properties

  • Bit-perfect determinism — Same seed → same result, every platform
  • Zero dynamic allocation — All buffers statically allocated
  • Deterministic shuffling — Feistel permutation with test vectors
  • Merkle provenance — Every epoch cryptographically committed
  • Fault detection — Overflow, underflow, domain errors tracked
  • Pure C99 — No platform-specific dependencies

Quick Start

git clone https://github.com/williamofai/certifiable-data.git
cd certifiable-data
mkdir build && cd build
cmake ..
make
make test

Expected output:

100% tests passed, 0 tests failed out of 8
Total Test time (real) = 0.04 sec

Compliance

Designed for certification under:

  • DO-178C (Aerospace)
  • IEC 62304 (Medical devices)
  • ISO 26262 (Automotive)
  • IEC 61508 (Industrial safety)

Related Projects

Project Description Demo
certifiable-inference Deterministic inference engine inference.speytech.com
certifiable-training Deterministic training engine training.speytech.com
certifiable-data Deterministic data pipeline

Together they provide a complete deterministic ML pipeline from data loading → training → inference.


Documentation

  • CT-MATH-001.md — Mathematical foundations
  • CT-STRUCT-001.md — Data structure specifications
  • docs/requirements/ — SRS documents (SRS-001 through SRS-006)

Built by SpeyTech in the Scottish Highlands.

Patent: UK GB2521625.0 — Murray Deterministic Computing Platform (MDCP)

For commercial licensing: william@fstopify.com