Resilient Analytical Pipeline (RAP) Framework

Hardware-Accelerated Real-Time Telemetry Processing

Executive Summary

A production-ready telemetry spine that processes high-velocity data streams with sub-millisecond p95 latency on enterprise GPUs and Apple Silicon, while preserving forensic traceability and local-first resilience.

Core Capabilities:

Semantic Repair: GPU-accelerated BERT kernels reconcile schema drift on-the-fly.
Microsecond Latency: Sustains high-throughput on Blackwell, Hopper, and M4 architectures.
Forensic Provenance: Tamper-evident SHA-256 hash chains for data integrity.
Edge Autonomy: Local-first buffering with SQLite WAL and deterministic Gate SLOs.

Why This Matters

In high-velocity environments like Formula 1 or Critical Care, schema drift is a silent killer of data integrity. Traditional pipelines react to drift by failing: dropping packets or triggering manual alerts that arrive too late.

The Resilient RAP framework solves the "Semantic Gap":

Zero-Loss Attribution: Ensuring that a sensor renamed on-the-fly is still correctly attributed to its historical baseline.
Proactive Engineering: Shifting from "Fixing the Pipeline" to "The Pipeline Fixes Itself."
Scientific Rigor: Providing a proven, 3-tier safety net that maintains sub-millisecond p95 latency even under 1MHz saturation.

Quick Start: High-Frequency Validation Suite

Follow these steps to replicate the sub-millisecond p95 latency benchmarks on your local hardware.

1. Environment Setup

# Initialize virtual environment and dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Build accelerated C++ ingestion kernels for Tier 2 BERT inference
python3 setup.py build_ext --inplace

2. Execute Validation Profiles

# Run 1kHz Sprint Validation (30,000 packets)
PYTHONPATH="." python3 tools/telemetry_gpu_stress_test.py --frequency 1000 --packets 30000 --output-suffix _sprint_1000hz

# Run 1MHz Weekend Endurance Validation (3.6M packets)
PYTHONPATH="." python3 tools/telemetry_gpu_stress_test.py --frequency 1000000 --packets 3600000 --output-suffix _weekend_1mhz

3. Launch Real-Time Dashboard

To monitor the ingestion stream, circuit-breaker status, and autonomous repairs in real-time, launch the browser-based observability dashboard:

# Linux/macOS
PYTHONPATH="." python -m uvicorn src.api_server:app --host 0.0.0.0 --port 5050

# Windows PowerShell
$env:PYTHONPATH="."; python -m uvicorn src.api_server:app --host 0.0.0.0 --port 5050

Then open http://localhost:5050/dashboard in any browser.

System Architecture: 3-Tier Resilient Reconciliation

The architecture prioritizes edge autonomy and "Self-Healing" resilience. Inbound telemetry is validated against a 3-tier reconciliation stack before forensic auditing.

flowchart TD
    RF["Ingress Downlink<br/>(100 Hz telemetry)"]
    CB["Circuit Breaker<br/>Schema + cadence validators"]
    
    subgraph RECON["3-Tier Reconciliation Stack"]
        direction TB
        CACHE["Tier 1: Verified Cache<br/>(O(1) Knowledge Base)"]
        BERT["Tier 2: Semantic Inference<br/>(O(n) GPU BERT)"]
        HITL["Tier 3: HITL Governor<br/>(Expert Correction)"]
        
        CACHE -- "Mismatch" --> BERT
        BERT -- "Low Confidence" --> HITL
        HITL -- "Human Validation" --> CACHE
    end
    
    DLQ[("Dead Letter Queue<br/>SQLite")]
    EDGE[("Edge Buffer<br/>SQLite WAL")]
    AUDIT[("Audit Log<br/>SHA-256 chain")]
    SINK["Central Sink"]
    
    RF --> CB
    CB -->|invalid| DLQ
    CB -->|valid| EDGE
    EDGE --> CACHE
    CACHE -- "Success" --> AUDIT
    BERT -- "Success" --> AUDIT
    AUDIT --> SINK

Core Methodology: 3-Tier Active-Learning Loop

Tier 1: Verified Mapping Cache (O(1)): Prioritizes previously human-validated mappings.
Tier 2: Semantic Inference (BERT): Utilizes GPU-accelerated BERT kernels to reconcile unseen drift.
Tier 3: Human-in-the-Loop Governor: Fallback for low-confidence inferences.

Research Highlights

1. The Resilience Delta (CPU vs. GPU)

Under high-stress conditions, standard CPU-only telemetry stacks consistently trip the circuit breaker and cease processing. This framework introduces a GPU-accelerated semantic safety net that repairs 100% of schema drift on-the-fly, maintaining zero downtime across all high-end NVIDIA architectures including Blackwell, Hopper, and Ada.

2. Reconciliation Ablation Study (BERT vs. Traditional)

A critical challenge in modern telemetry is Sensor Name Drift (e.g., from oil_temp to lubricant_thermal_deg). This framework justifies the use of BERT-based semantic reconciliation by comparing it against character-distance (Levenshtein) and rule-based (Regex) methods.

Algorithm	Mean Accuracy	Avg Latency	Key Performance Gap
BERT (all-MiniLM-L6-v2)	100.0%	~0.012 ms	Passes 100% of Synonym Drift (e.g., gas_reserve_pct)
Levenshtein (Distance)	28.6%	~0.001 ms	Fails 100% of Synonyms; only detects typos.
Regex (Pattern Matching)	85.7%	< 0.001 ms	Brilliant for known keywords; brittle for new sensor names.

Technical Conclusion: While BERT introduces a slight latency overhead (+0.011 ms), it eliminates the 71.4% data loss floor seen in character-distance methods, ensuring zero-loss sensor attribution in evolving telemetry environments.

3. Cross-Domain Portability (Healthcare)

To validate the framework's domain-agnostic capability, I applied the 3-tier architecture to clinical telemetry (FHIR-inspired vitals monitoring).

Metric	Automotive (F1)	Healthcare (Clinical)
Cold-Start Accuracy (BERT)	92.4%	30.4%
Forensic Confidence (Tier 3)	0.85+	0.65+
Healed Accuracy (Tier 1)	100.0%	100.0%

Tip

Clinical Insight: The lower cold-start accuracy in clinical informatics underscores the necessity of the Tier 3 Governor, as medical acronyms (e.g., SpO2, RR) often require human forensic context that transformer models lack in zero-shot scenarios.

4. Cross-Domain Translation Table

The table below summarizes mappings observed from recent domain test runs. Use the dashboard to trigger new runs and expand this table automatically.

Original Field	Translated Field	Domain	Confidence
`post_engagement_metric`	`post_engagement`	social-media	0.92
`follower_cnt`	`user_follower_count`	social-media	0.89
`closing_price`	`closing_price`	finance	1.00
`daily_vol`	`daily_volume`	finance	0.94
`gas_reserve_pct`	`fuel_reserve_percentage`	automotive	0.98
`oil_temp`	`lubricant_temperature`	automotive	0.97
`pulse_bpm`	`heart_rate`	healthcare	0.95
`spo2_saturation`	`blood_oxygen_pct`	healthcare	0.93
`temp_c`	`temperature_celsius`	weather	0.96
`wind_speed_kph`	`wind_speed_kph`	weather	0.94
`alt_m`	`altitude_meters`	aerospace	0.97
`vel_mps`	`velocity_meters_per_second`	aerospace	0.95
`v_rms`	`voltage_rms`	smart-grid	0.98
`f_hz`	`frequency_hertz`	smart-grid	0.96
`goal_cnt`	`goals_scored`	hockey	0.95
`assist_cnt`	`assists`	hockey	0.93
`shots_on_target`	`shots_on_goal`	soccer	0.96
`possession_pct`	`possession_percentage`	soccer	0.98
`td_run`	`rushing_touchdowns`	football	0.94
`yd_gain`	`yards_gained`	football	0.92
`fg_pct`	`field_goal_percentage`	basketball	0.97
`reb_cnt`	`rebounds`	basketball	0.95
`hr_cnt`	`home_runs`	baseball	0.96
`era_val`	`earned_run_average`	baseball	0.94
`item_price_cents`	`price`	ecommerce	0.90
`qty_sold`	`units_sold`	ecommerce	0.88

Notes:

Table generated from JSON results in docs/data/domain-tests/ (timestamped files).
Confidence scores reflect Tier 2 BERT semantic inference probability.

Performance & Scaling Validation

The framework has been validated across eight runtime targets with three independent runs per profile, measuring performance floor (p50), tail latency (p95), and resilience under 5% injected chaos.

1. Cross-Platform Baseline (100 Hz)

Note

All hardware and concurrency benchmarks below represent the Tier 2 (BERT Semantic Inference) processing latency. This is the computational "Deep Inference" baseline and does not include the near-zero O(1) latency of Tier 1 (Verified Cache).

Profile: Sprint (30,000 packets)

Runtime Target	Platform	Total Packets	p95 Latency (Mean)	Resilience Score	Status
NVIDIA B200 (Blackwell)	Linux + CUDA	30,000	0.008 ms	0.9996	[STABLE]
NVIDIA H200 NVL (Hopper)	Linux + CUDA	30,000	0.006 ms	0.9995	[STABLE]
NVIDIA RTX PRO 6000 Ada	Linux + CUDA	30,000	0.007 ms	0.9996	[STABLE]
NVIDIA RTX 5090	Linux + CUDA	30,000	0.011 ms	0.9996	[STABLE]
NVIDIA GTX 1660 Ti	Linux + CUDA	30,000	0.022 ms	0.9995	[STABLE]
AMD Radeon RX 7900 XT	Linux + ROCm	30,000	0.008 ms	0.9996	[STABLE]
Apple M4	macOS (MPS)	30,000	0.004 ms	0.9997	[STABLE]
Intel Core i5-12600K	x86 Fallback	30,000	N/A*	0.9996	[STABLE]

Profile: Weekend (3,600,000 packets)

Runtime Target	Platform	Total Packets	p95 Latency (Mean)	Resilience Score	Status
NVIDIA B200 (Blackwell)	Linux + CUDA	3,600,000	0.007 ms	0.9994	[RELIABLE]
NVIDIA H200 NVL (Hopper)	Linux + CUDA	3,600,000	0.013 ms	0.9993	[RELIABLE]
NVIDIA RTX PRO 6000 Ada	Linux + CUDA	3,600,000	0.006 ms	0.9995	[RELIABLE]
NVIDIA RTX 5090	Linux + CUDA	3,600,000	0.010 ms	0.9994	[RELIABLE]
NVIDIA GTX 1660 Ti	Linux + CUDA	3,600,000	0.019 ms	0.9995	[RELIABLE]
AMD Radeon RX 7900 XT	Linux + ROCm	3,600,000	0.007 ms	0.9994	[RELIABLE]
Apple M4	macOS (MPS)	3,600,000	0.003 ms	0.9995	[RELIABLE]
Intel Core i5-12600K	x86 Fallback	3,600,000	N/A*	0.9995	[RELIABLE]

*N/A: x86 CPU Fallback does not support sub-microsecond hardware-timestamped p95 latency measurement in standard telemetry mode.

2. Concurrency & Team Scaling

This profile validates the ability to handle two simultaneous telemetry streams on a single shared GPU.

Dual Car Benchmarking Comparison (Apple M4)

Profile	Metric	1-Car (Normal)	2-Car (Team)	Comparison
Sprint	Total Packets	30,000	60,000 (30k/car)	2x Load
Sprint	p95 Latency	< 0.004 ms	< 0.006 ms	No measurable overhead
Weekend	Total Packets	3,600,000	7,200,000	2x Extreme Load
Weekend	p95 Latency	0.003 ms	0.005 ms	No measurable overhead
Both	Resilience Score	0.9995	0.9978	[STABLE]

Dual Car Benchmarking Comparison (AMD 7900XT)

Profile	Metric	1-Car (Normal)	2-Car (Team)	Comparison
Sprint	Total Packets	30,000	60,000 (30k/car)	2x Load
Sprint	p95 Latency	< 0.010 ms	< 0.010 ms	Negligible overhead
Weekend	Total Packets	3,600,000	7,200,000	2x Extreme Load
Weekend	p95 Latency	0.007 ms	~0.008 ms	+0.001 ms overhead
Both	Resilience Score	0.9994	0.9995	[STABLE]

3. High-Frequency Stability Analysis

The following matrices validate stability across synthetic frequencies (1kHz to 1MHz) for elite hardware architectures.

Stability Matrix: Apple M4

Profile	Target Frequency	p95 Latency	Resilience Score	Status
Sprint (30k total)	1,000 Hz (1kHz)	0.009 ms	0.9967	[STABLE]
Sprint (30k total)	1,000,000 Hz (1MHz)	0.009 ms	0.9970	[STABLE]
Weekend (3.6M total)	1,000 Hz (1kHz)	0.004 ms	0.9971	[RELIABLE]
Weekend (3.6M total)	1,000,000 Hz (1MHz)	0.005 ms	0.9969	[RELIABLE]

Stability Matrix: AMD Radeon RX 7900 XT

Profile	Target Frequency	p95 Latency	Resilience Score	Status
Sprint (30k total)	1,000 Hz (1kHz)	< 0.001 ms	0.9989	[STABLE]
Sprint (30k total)	1,000,000 Hz (1MHz)	< 0.001 ms	0.9990	[STABLE]
Weekend (3.6M total)	1,000 Hz (1kHz)	< 0.001 ms	0.8820	[RELIABLE]
Weekend (3.6M total)	1,000,000 Hz (1MHz)	< 0.001 ms	0.8699	[RELIABLE]

Tip

Performance Amortization: p95 latency on the M4 actually improves during high-volume 'Weekend' runs (0.004ms) compared to short 'Sprint' runs (0.009ms), demonstrating the efficiency of the framework's GPU-accelerated batching kernels once warm.

4. LLM Chaos Comparison

I added an optional LLM-driven chaos mode that can use a local model through LM Studio or an Ollama-style endpoint. The current default model is gemma-4-e4b-it.

Comparison Summary

Run	Packets	Acceptance	Rejected	Chaos	DLQ	p95 Latency	Corruption Detection	Resilience
Standard baseline	30,000	95.81%	1,256	1,513	1,190	0.005 ms	76.92%	99.98%
Aggressive + Gemma mean	30,000	92.38%	2,285	3,590	2,219	0.021 ms	63.65%	99.98%

What the smoke test showed

I compared the archived 30k standard sprint run against the mean of three corrected 30k aggressive Gemma runs.
The aggressive runs used --chaos-profile aggressive and --chaos-strategy llm with gemma-4-e4b-it.
Aggressive chaos shifted strongly toward schema drift and string corruption: the mean schema-drift count rose from 251 to 1,300, and string-in-numeric rose from 190 to 867.
The aggressive runs are now normalized under data/solo/M4/aggressive/Run1, data/solo/M4/aggressive/Run2, and data/solo/M4/aggressive/Run3.

Practical takeaway

If LM Studio returns a valid JSON plan, the new path can bias chaos mode selection and mutation ranges/tokens.
If the model is unavailable or the response is not usable, the framework safely falls back to the original random chaos behaviour.
The aggressive profile is the right choice when you want the Gemma-backed run to stress the BERT reconciliation path harder than the standard baseline.

Aggressive mode

For heavier BERT stress, run the GPU suite with --chaos-profile aggressive alongside --chaos-strategy llm. That profile biases toward schema drift, adversarial string corruption, and wider numeric flips so the semantic reconciler has to work harder, and it stores runs under data/solo/M4/aggressive/... by default.

Real-World Use Cases

Formula 1 & Elite Motorsport

Dynamic Aero Testing: Reconciling sensor aliasing during mid-session wing or floor swaps without losing calibration data.
Team Scaling: Managing concurrent high-frequency streams (Driver 1 vs. Driver 2) on limited trackside hardware.

Healthcare & ICU Monitoring

Legacy Integration: Mapping heterogeneous bedside monitor outputs (e.g., SpO2 vs. Vitals_Heart) to a standardized clinical record.
Patient Safety: Ensuring 100% data continuity during sensor dropouts or aliasing in high-acuity environments.

Autonomous Systems

Sensor Fusion Drift: Maintaining deterministic temporal sync when LiDAR/Radar schemas evolve across fleet-wide firmware updates.

Real-Time Observability

The framework includes a browser-based observability dashboard for monitoring ingestion health, schema drift detection, and autonomous "Self-Healing" repairs.

Dashboard Features

Circuit Breaker State: Colour-coded status indicator (green: CLOSED, yellow: HALF_OPEN, red: OPEN).
DLQ Depth: Live tracking of quarantined packets over time.
Edge Buffer: Progress indicators for SQLite WAL utilisation and sync status.
SLO Monitoring: Real-time evaluation of all 6 service level objectives.
Autonomous Repairs: Live visualization of Tier 2 and Tier 3 reconciliation events.
Auto-Refresh: Polls every 3 seconds for persistent real-time accuracy.

Operational API Endpoints

A FastAPI-powered REST API exposes the pipeline's health, metrics, and operational controls.

Endpoint	Method	Description
`/health`	GET	Liveness and readiness probe status.
`/metrics`	GET	Live circuit breaker state and buffer utilisation.
`/slo`	GET	Real-time SLO evaluation against 6 production budgets.
`/reports`	GET	List and fetch specific benchmark report JSONs.
`/run`	POST	Trigger smoke or chaos tests through the pipeline.
`/run/chaos`	POST	Trigger a 20-packet chaos test (15% corruption).
`/circuit-breaker/reset`	POST	Manual circuit breaker reset to CLOSED.
`/dashboard`	GET	Serve the browser-based observability UI.

Once the server is running (see Quick Start), access the Interactive API Docs at http://localhost:5050/docs.

Limitations and Future Work

Current Limitations

Latency Budget: While p95 is excellent, the ~10µs BERT overhead at 1MHz makes single-threaded real-time ingestion tight; multi-threading is required for higher throughput.
Cold-Start Domains: Zero-shot accuracy is lower in highly specialized domains (e.g., Clinical Informatics) without Tier 1 cache warm-up.

Future Work

On-Device Quantization: Implementing INT8/GGUF quantization for BERT to enable microsecond-level inference on low-power edge devices.
RL-Guided Repairs: Using Reinforcement Learning to optimize Tier 3 HITL triggers and reduce expert-intervention frequency.
Multi-Modal Reconciliation: Extending the RAP pipeline to reconcile visual (Video/FLIR) and textual (Log) telemetry streams.

Development & CI

Quality gates triggered on every push:

Lint: flake8
Coverage: pytest-cov (75% minimum)
Stress Test: Chaos engine (1,000 packets @ 15% corruption)
Forensic Audit: Batch hash-chain integrity verification

ADRs and Licencing

ADRs: Key decisions are documented in docs/adr/.
Licence: PolyForm Non-commercial Licence 1.0.0.
Contact: Tarek Clarke (tclarke91@proton.me)

Name		Name	Last commit message	Last commit date
Latest commit History 538 Commits
.devcontainer		.devcontainer
.github		.github
adapters/sports		adapters/sports
archive		archive
assets		assets
dashboard		dashboard
data		data
docs		docs
examples		examples
experiments		experiments
frontend		frontend
modules		modules
outputs		outputs
results/figures		results/figures
src		src
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHIVE_SUMMARY.md		ARCHIVE_SUMMARY.md
CITATION.cff		CITATION.cff
COMPLETION_CHECKLIST.md		COMPLETION_CHECKLIST.md
COMPLETION_STRESS_TEST_CHECKLIST.md		COMPLETION_STRESS_TEST_CHECKLIST.md
CONTRIBUTING.md		CONTRIBUTING.md
DIAGNOSTIC_ANALYSIS_RESULTS.md		DIAGNOSTIC_ANALYSIS_RESULTS.md
DIAGNOSTIC_FRAMEWORK_COMPLETION.md		DIAGNOSTIC_FRAMEWORK_COMPLETION.md
DIAGNOSTIC_QUICK_START.md		DIAGNOSTIC_QUICK_START.md
DUAL_SETUP_GUIDE.md		DUAL_SETUP_GUIDE.md
Dockerfile		Dockerfile
Dockerfile.production		Dockerfile.production
GETTING_STARTED.md		GETTING_STARTED.md
KAFKA_INTEGRATION_SUMMARY.md		KAFKA_INTEGRATION_SUMMARY.md
LICENSE		LICENSE
MODULE_SUMMARY.md		MODULE_SUMMARY.md
OPERATIONS.md		OPERATIONS.md
PHD_VALIDATION_README.md		PHD_VALIDATION_README.md
PHD_VALIDATION_SUMMARY.md		PHD_VALIDATION_SUMMARY.md
PRODUCTION.md		PRODUCTION.md
QUICK_START_VALIDATION.py		QUICK_START_VALIDATION.py
README.md		README.md
START_HERE.md		START_HERE.md
STRESS_TEST_DEPENDENCY_ANALYSIS.md		STRESS_TEST_DEPENDENCY_ANALYSIS.md
WINDOWS_QUICKSTART.md		WINDOWS_QUICKSTART.md
WINDOWS_SETUP.md		WINDOWS_SETUP.md
app_entry.py		app_entry.py
docker-compose.production.yml		docker-compose.production.yml
docker-compose.yml		docker-compose.yml
docker_build_output.txt		docker_build_output.txt
fast_ingest.cpp		fast_ingest.cpp
main.py		main.py
pytest.ini		pytest.ini
pytest_output.txt		pytest_output.txt
pytest_stress_output.txt		pytest_stress_output.txt
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt
run_benchmarks.sh		run_benchmarks.sh
setup.py		setup.py
setup_windows_hip.bat		setup_windows_hip.bat
setup_windows_hip.ps1		setup_windows_hip.ps1
test_diagnostic_integration.py		test_diagnostic_integration.py
test_fast_ingest.py		test_fast_ingest.py
test_rocm_gpu.py		test_rocm_gpu.py
verify_windows_hip.ps1		verify_windows_hip.ps1

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Resilient Analytical Pipeline (RAP) Framework

Executive Summary

Why This Matters

Quick Start: High-Frequency Validation Suite

1. Environment Setup

2. Execute Validation Profiles

3. Launch Real-Time Dashboard

System Architecture: 3-Tier Resilient Reconciliation

Core Methodology: 3-Tier Active-Learning Loop

Research Highlights

1. The Resilience Delta (CPU vs. GPU)

2. Reconciliation Ablation Study (BERT vs. Traditional)

3. Cross-Domain Portability (Healthcare)

4. Cross-Domain Translation Table

Performance & Scaling Validation

1. Cross-Platform Baseline (100 Hz)

Profile: Sprint (30,000 packets)

Profile: Weekend (3,600,000 packets)

2. Concurrency & Team Scaling

Dual Car Benchmarking Comparison (Apple M4)

Dual Car Benchmarking Comparison (AMD 7900XT)

3. High-Frequency Stability Analysis

Stability Matrix: Apple M4

Stability Matrix: AMD Radeon RX 7900 XT

4. LLM Chaos Comparison

Comparison Summary

What the smoke test showed

Practical takeaway

Aggressive mode

Real-World Use Cases

Formula 1 & Elite Motorsport

Healthcare & ICU Monitoring

Autonomous Systems

Real-Time Observability

Dashboard Features

Operational API Endpoints

Limitations and Future Work

Current Limitations

Future Work

Development & CI

ADRs and Licencing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages