Production-grade federated learning platform that combines Byzantine-resilient aggregation, trust verification, governance policy controls, tokenomics telemetry, and full-stack observability.
Documentation entrypoint: docs/README.md
Canonical docs navigation: docs/README.md for active operator guides and Documentation/MASTER_DOCUMENTATION_INDEX.md for full repository documentation indexing.
If you just cloned the repo and want to run tests quickly, use this sequence.
git clone https://github.com/rwilliamspbg-ops/Sovereign_Map_Federated_Learning.git
cd Sovereign_Map_Federated_Learningmake help
make fmt
make lint
make testmake smoke
make observability-smoke
make quickstart-verifyWhere to get contribution guidance:
- Full contribution process and PR checklist: CONTRIBUTING.md
- Quick contribution opportunities: README.md#help-wanted-quick-wins
- Runtime validation expectations: README.md#contributor-first-steps
- Operations dashboard metric contract: docs/OPERATIONS_DASHBOARD_METRIC_CONTRACT.md
The consolidated validation path now supports profile-based execution, trend SLO enforcement, artifact diff summaries, browser runtime cadence checks, and scheduled deep validation runs.
What was added:
- Required-style PR gate workflow: .github/workflows/full-validation-pr-gate.yml
- Scheduled deep workflow: .github/workflows/full-validation-scheduled-deep.yml
- Fast and deep suite profiles: tests/scripts/python/run_full_validation_suite.py
- Trend SLO checker: tests/scripts/ci/check_validation_trends.py
- CI diff summary writer: tests/scripts/ci/write_validation_diff_summary.py
- Browser runtime E2E cadence check: tests/scripts/python/test_browser_runtime_e2e.py
- Playwright runtime artifacts: tests/e2e/runtime-cadence.spec.js, tests/e2e/playwright.config.js
Canonical commands:
npm run test:setup
npm run test:full:fast
npm run test:full:deep
npm run test:trends
npm run test:summary:diffValidation artifacts:
test-results/full-validation/full_validation_<timestamp>.jsontest-results/full-validation/full_validation_<timestamp>.mdtest-results/full-validation/history.jsonl
Documentation governance:
- Documentation maintenance runbook: docs/DOCUMENTATION_MAINTENANCE.md
- Test setup details and profile usage: tests/docs/TEST_ENV_SETUP.md
The mobile hardening and store packaging track is now implemented in-repo.
What is now live:
- Hardware-backed mobile signer wrappers for iOS Secure Enclave and Android StrongBox/Keystore integration in app flow.
- Canonical signed gradient envelope adapter on both mobile platforms, aligned to backend verifier contract.
- Backend endpoint for signed mobile gradient verification:
/mobile/verify_gradient. - Contract test coverage for valid and invalid mobile signed payloads.
- Production store wrapper packages for both Android Play Store and Apple App Store submission flows.
Primary references:
- Mobile implementation overview: mobile-apps/MOBILE_APP_README.md
- Android store wrapper: mobile-apps/android-node-app/store-wrapper/README.md
- iOS store wrapper: mobile-apps/ios-node-app/store-wrapper/README.md
- iOS submission checklist: mobile-apps/ios-node-app/store-wrapper/APP_STORE_CONNECT_SUBMISSION_CHECKLIST.md
- Backend verifier endpoint implementation: sovereignmap_production_backend_v2.py
- Mobile verifier contract test: tests/scripts/python/test_mobile_verify_gradient_contract.py
This upgrade ties blockchain and bridge execution telemetry directly into the production Grafana surfaces.
What was added:
- New exporter metrics for blockchain and bridge runtime state:
tokenomics_chain_heighttokenomics_bridge_transfers_totaltokenomics_bridge_routes_activetokenomics_fl_verification_ratiotokenomics_fl_average_confidence_bps
- Tokenomics telemetry payload now emits these fields from backend runtime calculations.
- Grafana Operations and Tokenomics dashboards now include a dedicated Blockchain and Bridge Runtime section with verification, confidence, transfer throughput, route count, and chain height.
Primary files updated for this upgrade:
- Metrics exporter: tokenomics_metrics_exporter.py
- Backend telemetry payload: sovereignmap_production_backend_v2.py
- Operations dashboard: grafana/provisioning/dashboards/operations_overview.json
- Tokenomics dashboard: grafana/provisioning/dashboards/tokenomics_overview.json
- Prometheus scrape config: prometheus.yml
Dashboard provisioning note:
- Canonical dashboards are served from grafana/provisioning/dashboards.
- Grafana home dashboard now defaults to grafana/provisioning/dashboards/operations_overview.json.
- STARRED live dashboard set:
- grafana/provisioning/dashboards/operations_overview.json
- grafana/provisioning/dashboards/tokenomics_overview.json
- grafana/provisioning/dashboards/llm_overview.json
Operator validation commands:
make observability-smokepython3 scripts/check_dashboard_queries.py
This upgrade package adds a local-first marketplace and governance workflow with production-facing observability guardrails.
What is included:
- Marketplace flows: offers, intents, matching, escrow release, dispute workflows, and governance proposals/voting.
- Network expansion flows: attestation sharing, self-service invite requests, admin approval/rejection/revocation.
- Dashboard and metrics integration: marketplace/governance snapshots in
/metrics_summaryand expanded HUD browser demo controls. - Prometheus additions:
marketplace_alerts.ymlwith stall/high-watermark detection plus promtool tests inmarketplace_alerts.test.yml. - API contract tests: local positive-path and negative-path coverage under
tests/scripts/python/test_marketplace_local_contracts.pyandtests/scripts/python/test_marketplace_negative_paths.py.
Primary references:
- First 10 minutes guide: docs/OPEN_ECOSYSTEM_FIRST_10_MINUTES.md
- Sprint 1 roadmap: docs/OPEN_ECOSYSTEM_SPRINT1_ROADMAP.md
- Sprint 2 roadmap: docs/OPEN_ECOSYSTEM_SPRINT2_ROADMAP.md
- API examples: docs/api/http-examples.md
- Backend implementation: sovereignmap_production_backend_v2.py
- Grafana operations dashboard: grafana/provisioning/dashboards/operations_overview.json
Validation commands:
make observability-smokemake observability-live-smokemake alerts-testpython3 tests/scripts/python/test_marketplace_local_contracts.pypython3 tests/scripts/python/test_marketplace_negative_paths.py
Canonical auditable artifact capture command:
RESULTS_ROOT=artifacts/final-verification/$(date +%F) TARGET_NODES=10 STRICT_NPU=0 DURATION_SECONDS=600 INTERVAL_SECONDS=60 bash scripts/demo-10min-auditable.shThe following environment variables are available for safe runtime tuning:
- FL aggregation path selection:
FL_AGGREGATION_MODE=auto|loop|vectorizedFL_AGGREGATION_VECTORIZE_MIN_CLIENTS(default1000)FL_AGGREGATION_VECTORIZE_MAX_PEAK_BYTES(default536870912)- DP/Opacus parameters:
DP_NOISE_MULTIPLIER(default1.1)DP_MAX_GRAD_NORM(default1.0)- TPM cache and spike controls:
TPM_ATTESTATION_MAX_REPORTS(default256)TPM_ATTESTATION_CACHE_TTL(default30s)TPM_ATTESTATION_SPIKE_THRESHOLD(default200us)
Operational notes:
- Aggregation path usage is exported as
fl_aggregation_path_total{impl="loop|vectorized"}. - The Operations dashboard includes a "FL Aggregation Path Usage" panel to verify auto-mode behavior.
- The Operations dashboard includes a "What Changed (Current vs Prior Window)" panel for rapid delta-based triage.
- The Operations dashboard includes annotations for control/config changes via
sovereign_ops_control_actions_total. - The HUD includes runbook-match cards for active
opsHealth.alertsto accelerate first-response triage.
Roadmap and execution tracking:
Incident bundle export (first-response evidence):
python3 scripts/export_incident_bundle.pyIncident tooling CI guard:
Run the fastest end-to-end path from startup to live dashboards:
docker compose -f docker-compose.full.yml up -d --scale node-agent=5
make stack-verifyBefore running the startup command, confirm Docker has enough space for first-time image builds:
df -h
docker system dfIf Docker cache is full, reclaim space and retry:
docker system prune -afThen open:
- HUD:
http://localhost:3000 - Grafana:
http://localhost:3001 - Prometheus:
http://localhost:9090
Use this flow as the canonical local run sequence.
- Step 1: Preflight checks
docker --version
docker compose version
df -h
docker system df- Step 2: Build all full-stack images explicitly (no start yet)
docker compose -f docker-compose.full.yml build- Step 3: Start the full stack with five node agents
docker compose -f docker-compose.full.yml up -d --scale node-agent=5- Step 4: Verify service state and health
docker compose -f docker-compose.full.yml ps
curl -fsS http://localhost:8000/status | jq
curl -fsS http://localhost:8000/health | jq
curl -fsS http://localhost:8000/ops/health | jq- Step 5: Follow logs during first run (optional but useful)
docker compose -f docker-compose.full.yml logs -f backend frontend prometheus grafana alertmanagerCommon build/start options:
- Rebuild from scratch if dependencies changed:
docker compose -f docker-compose.full.yml build --no-cache - Recreate containers after image rebuild:
docker compose -f docker-compose.full.yml up -d --force-recreate --scale node-agent=5 - Remove stale orphans after compose changes:
docker compose -f docker-compose.full.yml up -d --remove-orphans --scale node-agent=5
If you hit disk pressure during build (No space left on device):
docker system prune -af
docker builder prune -afStop and clean up:
docker compose -f docker-compose.full.yml down --remove-orphansA ready-to-run PySyft x Mohawk integration proof-of-concept lives in examples/pysyft-integration.
Quick start:
pip install -r examples/pysyft-integration/requirements-pysyft-demo.txt
python examples/pysyft-integration/pysyft_mohawk_poc.py --mode mock --rounds 5 --participants 5Sovereign Map uses a streaming aggregation model instead of loading full model updates into memory at once.
- Memory efficiency: Mohawk-style chunked processing reduces memory pressure by up to 224x for large update sets.
- Byzantine resilience: selective verification and trust scoring reduce adversarial impact with sublinear validation behavior for high node counts.
- Hardware root of trust: every node contributes attestation and certificate telemetry into the same operational control plane.
flowchart LR
A[Client Updates] --> B{Traditional FL Aggregator}
B --> C[Load full model deltas in memory]
C --> D[High RAM footprint per round]
A --> E{SovereignMap Mohawk Stream Aggregator}
E --> F[Chunk updates into streaming windows]
F --> G[Validate trust and policy per chunk]
G --> H[Aggregate incrementally]
H --> I[Low steady-state memory use]
Mohawk-style streaming aggregation treats model updates as a continuous stream of chunks rather than a monolithic tensor payload. This allows the coordinator to perform verification, filtering, and merge steps incrementally while retaining bounded working memory. In practice, this is what makes high fan-out node participation feasible on commodity infrastructure: memory usage scales with chunk window size instead of full global update size, while trust and policy checks run inline with aggregation.
Sovereign Map currently uses the Mohawk Proto streaming aggregation approach as its default high-scale aggregation path.
Advisory (informational): Mohawk Proto and related aggregation design elements are marked as provisional-patent advisory material by project maintainers.
- The implementation in this repository is provided for evaluation, research, and integration testing.
- Commercial licensing expectations may evolve as provisional filings progress.
- For enterprise/commercial questions, open a discussion in GitHub Discussions.
Legend: scan left-to-right by trust order: quality gates -> security/governance -> SDK/release -> deploy surfaces -> device/runtime footprint -> community signals.
Core Quality Gates:
Security and Governance Gates:
SDK and Release Engineering:
Deployment and Packaging:
Device and Runtime Footprint:
Community and Repository Signals:
- Browse all workflows: GitHub Actions workflow matrix
- Contributor and governance process: CONTRIBUTING.md
- Running secure, federated ML training across distributed nodes where raw data must stay local.
- Operating Byzantine-resilient model aggregation in adversarial or partially trusted environments.
- Building trust-aware AI infrastructure with policy controls, attestation signals, and auditable telemetry.
- Monitoring real-time FL, tokenomics, and system health through Prometheus and Grafana surfaces.
- Prototyping and scaling from local Docker deployments to large Compose/Kubernetes profiles.
| Node Class | Minimum (Functional) | Recommended (Sustained) |
|---|---|---|
| Edge CPU Node | Raspberry Pi 4 (4 GB RAM), 4-core ARM CPU, 32 GB storage, Linux, TPM 2.0 device access | Raspberry Pi 5 / x86 mini PC (8-16 GB RAM), NVMe storage, TPM 2.0, stable wired network |
| Edge GPU/NPU Node | Jetson Nano / Intel NPU-capable edge device, 8 GB RAM, CUDA/NPU drivers | NVIDIA Jetson Orin / equivalent, 16+ GB RAM, tuned CUDA/NPU stack |
| Operator / Aggregator | 8 vCPU, 16 GB RAM, SSD, Docker Compose | 16+ vCPU, 32-64 GB RAM, NVMe, GPU optional, isolated monitoring host |
| Monitoring Stack | 2 vCPU, 4 GB RAM for Prometheus + Grafana | 4-8 vCPU, 8-16 GB RAM with longer retention and dashboard concurrency |
Use hardware_auto_tuner.py to auto-profile host capability and choose an acceleration profile before large-scale runs.
Sovereign Map Federated Learning is a dual-plane runtime:
- Aggregation plane: Flower-based federated coordination with Byzantine-robust strategy logic and convergence tracking.
- Control and telemetry plane: Flask services for health, HUD, trust/policy operations, join lifecycle, and metrics publication.
Core characteristics:
- Byzantine-resilient training strategy with runtime convergence history.
- Trust and verification APIs for attestation-style governance workflows.
- Policy and join-management endpoints for operator-controlled enrollment.
- Prometheus-compatible metrics exporters for operational and tokenomics surfaces.
- Multi-profile deployment via Docker Compose and Kubernetes manifests.
- Hardware-aware tests spanning NPU, XPU, CUDA/ROCm, MPS, and CPU fallbacks.
- Backend aggregation and APIs: sovereignmap_production_backend_v2.py
- Tokenomics metrics exporter: tokenomics_metrics_exporter.py
- TPM metrics exporter: tpm_metrics_exporter.py
- Frontend HUD: frontend/src/HUD.jsx
- Primary compose profile: docker-compose.full.yml
- Kubernetes scale profile: kubernetes-5000-node-manifests.yaml
Visual proof for this project should be treated as release evidence, not optional decoration.
Expected screenshot artifacts per release:
- Operations HUD: trust score, node participation, latency wall, and resilience indicators.
- Grafana Operations Overview: gauge deck + trend wall under live load.
- Grafana Tokenomics Overview: mint/bridge/validator/wallet health sections.
Tracked asset locations:
docs/screenshots/hud-operations-overview.pngdocs/screenshots/grafana-operations-overview.pngdocs/screenshots/grafana-llm-overview.pngdocs/screenshots/grafana-tokenomics-overview.png
Capture workflow and acceptance checklist:
Feature artifact manifest:
Current status: screenshot paths are defined and release capture workflow is documented; attach rendered PNG/GIF evidence in each tagged release.
sequenceDiagram
participant Node as Edge Node Client
participant Flower as Flower Aggregation Plane (:8080)
participant Mohawk as Mohawk Stream Aggregator
participant Policy as Trust/Policy Gate
participant API as Control Plane API (:8000)
participant Prom as Prometheus
participant Grafana as Grafana/HUD
Node->>Flower: Submit model update (FitRes)
Flower->>Mohawk: Forward update chunks
Mohawk->>Policy: Validate adapter policy + attestation metadata
Policy-->>Mohawk: Accept/Reject + reason labels
Mohawk->>Flower: Incremental aggregate result
Flower->>API: Publish round metrics + convergence snapshot
API->>Prom: Expose /metrics and event-derived gauges/counters
Prom->>Grafana: Scrape telemetry
API->>Grafana: Serve /health, /ops/health, /hud_data, /metrics_summary
| Domain | Runtime Surfaces | Purpose |
|---|---|---|
| Federated learning | sovereignmap_production_backend_v2.py, src/client.py | Round orchestration, aggregation, convergence |
| Trust and attestation | tpm_cert_manager.py, tpm_metrics_exporter.py, secure_communication.py | Identity, verification, trust signals |
| Governance and policy | bridge-policies.json, capabilities.json | Runtime controls and policy surfaces |
| Tokenomics and economics | tokenomics_metrics_exporter.py, tokenomics_metrics_exporter.py | Economic telemetry and dashboard inputs |
| Observability | prometheus.yml, alertmanager.yml, fl_slo_alerts.yml | Metrics collection, alerting, SLO validation |
| Operations | deploy.sh, deploy_demo.sh, Makefile | Deployment and repeatable operator workflows |
Use this quick index to jump directly to API command groups and docs.
API command index:
- Control Plane API commands
- Training Service API commands
- TPM Exporter API commands
- Tokenomics Exporter API commands
API docs index:
- HTTP examples: docs/api/http-examples.md
- Swagger UI (multi-spec): docs/api/swagger-ui.html
- OpenAPI specs: docs/api/openapi.yaml, docs/api/openapi.training.yaml, docs/api/openapi.tpm.yaml, docs/api/openapi.tokenomics.yaml
- Postman collection: docs/api/postman_collection.json
- API validator command:
npm run api:validate - API docs CI: .github/workflows/api-spec-validation.yml, .github/workflows/api-docs-pages.yml
curl -s http://localhost:8000/status | jq
curl -s http://localhost:8000/health | jq
curl -s http://localhost:8000/ops/health | jq
curl -s -X POST http://localhost:8000/trigger_fl | jq
curl -s http://localhost:8000/metrics_summary | jq
curl -s http://localhost:8000/convergence | jqcurl -s http://localhost:5001/health | jq
curl -s http://localhost:8000/training/status | jqcurl -s http://localhost:9091/health | jq
curl -s http://localhost:9091/metrics/summary | jqcurl -s http://localhost:9105/health | jq| Endpoint | Method | Function | Responsibility |
|---|---|---|---|
| /health | GET | health | Service health, enclave status, HUD telemetry snapshot |
| /status | GET | status | Aggregator runtime identity and core port map |
| /chat | POST | chat_query | HUD assistant query handling for operator prompts |
| /hud_data | GET | hud_data | HUD metrics including audit accuracy and simulation counters |
| /founders | GET | get_founders | Founding-signature identity list for governance views |
| /trigger_fl | POST | trigger_fl_round | Manual FL round simulation and convergence updates |
| /create_enclave | POST | create_enclave | Enclave state transition workflow |
| /convergence | GET | get_convergence | Convergence history arrays for charting |
| /metrics_summary | GET | metrics_summary | Aggregated metrics summary across runtime domains |
| /model_registry | GET | model_registry_recent | Recent persisted model metadata and round snapshots |
| /simulate/<simulation_type> | POST | trigger_hud_simulation | Records HUD simulation events by scenario type |
| /ops/health | GET | ops_health | Operational dependency/system snapshot (ports, memory/disk pressure, Prometheus reachability) |
| /ops/events/recent | GET | ops_events_recent | Returns recent operations events for timeline replay |
| /ops/events | GET (SSE) | ops_events_stream | Server-sent event stream for live operations telemetry |
| Endpoint | Method | Function | Responsibility |
|---|---|---|---|
| /trust_snapshot | GET | trust_snapshot | Current trust mode, policy state, and policy history |
| /verification_policy | POST | update_verification_policy | Runtime policy update surface |
| /llm_policy | GET | llm_policy_view | Exposes active LLM adapter validation policy |
| /join/policy | GET | join_policy_view | Join bootstrap policy and onboarding constraints |
| /join/invite | POST | create_join_invite | Issue join invites with bounded TTL and permissions |
| /join/register | POST | register_join_participant | Register participant certificates and join metadata |
| /join/registrations | GET | list_join_registrations | Admin listing of registered participants |
| /join/revoke/int:node_id | POST | revoke_join_participant | Administrative revocation of participant certificate |
| Endpoint | Method | Function | Responsibility |
|---|---|---|---|
| /training/start | POST | start_training | Trigger training start signal for HUD/ops flows |
| /training/stop | POST | stop_training | Trigger training halt signal |
| /training/status | GET | training_status | Current mocked training progress and metrics view |
| Endpoint | Method | Function | Responsibility |
|---|---|---|---|
| /metrics | GET | metrics | Prometheus exposition endpoint for tokenomics gauges |
| /health | GET | health | Tokenomics exporter liveness and source-file metadata |
| /event/tokenomics | POST | event_tokenomics | Ingest tokenomics events and persist canonical payload |
| Endpoint | Method | Function | Responsibility |
|---|---|---|---|
| /metrics | GET | metrics | Prometheus exposition endpoint for TPM/trust metrics |
| /metrics/summary | GET | metrics_summary | Aggregated TPM/trust summary snapshot |
| /health | GET | health | TPM exporter liveness and metadata |
| /event/attestation | POST | event_attestation | Ingest attestation event payloads |
| /event/message | POST | event_message | Ingest trust-related operational messages |
- Auth boundaries:
/join/registrationsand/join/revoke/<int:node_id>are admin-gated and require valid admin authorization headers. - Auth boundaries:
/verification_policysupports role-aware updates viaX-API-Roleand optional bearer token wiring. - Status code behavior:
/create_enclavemay return202while provisioning is in progress, then200once a stable state transition is reached. - Status code behavior:
/trigger_flmay return202for accepted async execution and non-2xx when round execution cannot proceed. - Streaming semantics:
/ops/eventsis an SSE endpoint and includes heartbeat events to keep long-lived clients connected. - Streaming semantics:
/ops/events/recentshould be used to backfill timeline state before attaching to SSE.
| Function | File | Responsibility |
|---|---|---|
| validate_llm_adapter_update | sovereignmap_production_backend_v2.py | Policy validation gate for incoming client updates |
| build_tokenomics_payload | sovereignmap_production_backend_v2.py | Constructs tokenomics publication payload from FL state |
| publish_tokenomics_event | sovereignmap_production_backend_v2.py | Sends tokenomics telemetry to exporter endpoint |
| publish_tpm_attestation_events | sovereignmap_production_backend_v2.py | Emits attestation events for trust metrics pipeline |
| run_flower_server | sovereignmap_production_backend_v2.py | Starts and configures Flower aggregation server |
| run_flask_metrics | sovereignmap_production_backend_v2.py | Starts Flask API plane for control and telemetry |
| create_app | tokenomics_metrics_exporter.py | Constructs exporter app and endpoint bindings |
If you want the fastest production-feel walkthrough, run the two commands below and open HUD + Grafana immediately:
docker compose -f docker-compose.full.yml up -d --scale node-agent=5
make stack-verifyWhat this startup command does:
- Starts the full runtime profile from docker-compose.full.yml.
- Scales the
node-agentservice to 5 replicas. - Brings up API, HUD, and observability services in one command.
Recommended verification sequence:
docker compose -f docker-compose.full.yml up -d --scale node-agent=5
docker compose -f docker-compose.full.yml ps
make stack-verifyTeardown after the walkthrough:
docker compose -f docker-compose.full.yml downExpected outcome in under 2 minutes:
- API health endpoints respond on
:8000. - HUD is reachable at
http://localhost:3000. - Grafana is reachable at
http://localhost:3001. - Prometheus is reachable at
http://localhost:9090.
- Go 1.25+
- Node.js 20+
- npm 10+
- Python 3.11+
- Docker with Compose plugin
git clone https://github.com/rwilliamspbg-ops/Sovereign_Map_Federated_Learning.git
cd Sovereign_Map_Federated_Learning
./genesis-launch.shdocker compose -f docker-compose.full.yml up -d --scale node-agent=5
docker compose -f docker-compose.full.yml pscurl -s http://localhost:8000/status | jq
curl -s http://localhost:8000/health | jq
curl -s http://localhost:8000/ops/health | jq
curl -s http://localhost:8000/training/status | jqControl-plane mTLS verification (expected behavior when mTLS control plane is enabled):
# This should fail without a client certificate.
curl -kfsS https://localhost:8080/p2p/info || echo "expected: mTLS client certificate required"If this request succeeds without a client certificate, verify your control-plane TLS policy before production use.
Expected checkpoints:
/statusreturns service identity and ports./ops/healthreports API, Flower, and Prometheus reachability.- frontend HUD is reachable at
http://localhost:3000. - Grafana is reachable at
http://localhost:3001.
CLI flow:
# Trigger one global round
curl -s -X POST http://localhost:8000/trigger_fl | jq
# Verify round advanced and metrics updated
curl -s http://localhost:8000/metrics_summary | jq '.federated_learning.current_round, .federated_learning.current_accuracy, .federated_learning.current_loss'
curl -s http://localhost:8000/convergence | jq '.current_round, .current_accuracy, .current_loss'UI flow:
- Open
http://localhost:3000. - Confirm Network Operations HUD is the default landing view.
- Click Run Global FL Epoch.
- Confirm the live timeline shows a
TRAINING_ROUNDevent and round metrics increment.
Workflow verification (GitHub Actions):
gh run list --branch main --limit 20docker compose -f docker-compose.full.yml up -d --scale node-agent=10# Go and backend tests
go test ./... -count=1
# Monorepo package build and tests
npm ci
npm run build:libs
npm run test:ci
# Frontend build
npm --prefix frontend ci
npm --prefix frontend run buildmake benchmark-fedavg-compareTo compare against a different baseline and output path:
BASE_REF=origin/main BENCH_RUNS=3 REPORT_PATH=results/metrics/fedavg_benchmark_compare.md make benchmark-fedavg-compareGenerated report:
CI workflow:
Before opening a PR, run the same fast checks maintainers use:
# Discover all available developer targets
make help
# Required baseline
make fmt
make lint
make test
# Recommended reproducibility smoke checks
make smoke
# Required visual evidence assets
make screenshots-checkFor runtime-focused changes (HUD, observability, policy endpoints), include at least one local verification artifact in your PR description:
/healthand/ops/healthoutput snippet.- one screenshot from HUD or Grafana.
- command log showing a successful
trigger_flround.
- Standard runtime sequence: docker-compose.full.yml
- Contribution guidelines: CONTRIBUTING.md
- Security policy: SECURITY.md
- Changelog: CHANGELOG.md
- License: LICENSE
If you want to contribute quickly, these areas have high impact and low setup friction:
- Test matrix expansion for TPM 2.0 hardware variants and Docker runtimes.
- Apple Silicon (MPS) acceleration optimization and benchmark baselines.
- Additional Grafana panel tuning for high-cardinality node fleets.
- Better synthetic fault workloads for Byzantine and partition simulation paths.
Jump directly to open issues by label:
- Good first issue: good first issue label
- Help wanted: help wanted label
- Documentation: documentation label
- Observability: observability label
Contribution process and coding standards are in CONTRIBUTING.md.
- Symptom: trust/attestation metrics stay flat or backend cannot initialize TPM flows.
- Check: container runtime must expose TPM devices and required permissions.
- Typical fix: run with explicit device mapping and appropriate group permissions for TPM access.
- Symptom: HUD shows backend unreachable or Grafana/Prometheus endpoints fail to bind.
- Check: local services already using frontend/backend ports.
- Typical fix: align Compose port mappings and frontend API base configuration so HUD and backend targets match.
Timestamp: 2026-03-19
- Tokenomics exporter handles directory-valued source path safely and persists payload without IsADirectoryError.
- HUD simulation controls are wired end-to-end (frontend action -> backend endpoint -> HUD counter surface).
- Backend and exporter modules compile successfully with Python syntax checks.
Verified green on main after latest changes:
- Build and Test
- Lint Code Base
- HIL Tests
- Reproducibility Check
- Governance Check
- Workflow Action Pin Check
- CodeQL Security Analysis
- Security Supply Chain
- Secret Scan
- Observability CI
- Build and Deploy
For release candidates, run one additional live smoke test using docker-compose.full.yml and verify /health, /hud_data, and /event/tokenomics before tagging.
Standalone report: SANITY_REPORT.md



