Skip to content

Latest commit

 

History

History
171 lines (130 loc) · 8.26 KB

File metadata and controls

171 lines (130 loc) · 8.26 KB

ARES-E Responsible AI Plan

Framework for ethical AI governance, bias mitigation, and human oversight across ARES-E operations. Aligned with NIST AI RMF 1.0, Executive Order 14110, DOE AI Principles, and DoD Ethical AI Guidelines.


1. Purpose & Scope

This plan governs all AI/ML components within ARES-E, specifically:

  • GridPINN — Physics-Informed Neural Network for grid dispatch optimization (Topic 16).
  • AdversarialDetector — Heuristic classifier for injection and data-poisoning detection (Topic 20).
  • DifferentialPrivacyMechanism — Laplace noise injection for privacy-preserving telemetry (Topic 20).

Non-AI components (thermal hydraulics solver, audit ledger, API routing) are excluded from this plan's AI-specific controls but remain subject to standard software quality assurance.


2. NIST AI RMF 1.0 Alignment

2.1 GOVERN

Subcategory Requirement ARES-E Implementation
GV-1 Policies for AI risk management This Responsible AI Plan
GV-2 Accountability structures GenesisHarness orchestrator; immutable audit ledger
GV-3 Workforce diversity & competency Performer team includes physics, cybersecurity, and ML expertise
GV-4 Organizational commitment Open-source posture enables public scrutiny

2.2 MAP

Subcategory Requirement ARES-E Implementation
MP-1 Context and intended use Energy grid optimization for DOE Advanced Materials and Systems Centers
MP-2 Identify interdependencies GridPINN output feeds VVUQ scoring; AdversarialDetector gates all inputs
MP-3 Benefits, costs, and risks Benefits: autonomous grid dispatch; Risk: adversarial manipulation
MP-4 Positive and negative impacts Positive: efficiency gain; Negative: model overreliance if VVUQ bypassed

2.3 MEASURE

Subcategory Requirement ARES-E Implementation
MS-1 Appropriate metrics identified VVUQ score (0.0–1.0); physics_violations count; AI-Advantage ratio
MS-2 AI system evaluated for trustworthiness AI-Advantage computed against deterministic classical baseline
MS-3 Internal and external evaluation 45 automated tests; STIX export for external SOC review
MS-4 Measurement process documentation VVUQ Framework document; acceptance test matrix

2.4 MANAGE

Subcategory Requirement ARES-E Implementation
MG-1 Risk treatment decisions Physics violations trigger absolute workflow failure
MG-2 Strategies to maximize benefits Adaptive grid dispatch with real-time constraint feedback
MG-3 Risks and benefits communicated Audit ledger and STIX reports delivered to program oversight
MG-4 Risk treatments monitored verify_chain() validates ledger integrity continuously

3. Model Card — GridPINN

Field Value
Model Name GridPINN v0.2.0
Model Type Physics-Informed Neural Network (PINN)
Architecture 3-layer MLP (1→64→32→1) with torch.tanh activation
Training Regime Online training per evaluation (20 epochs, Adam lr=1e-3)
Input Domain 1D synthetic grid load tensor (10 nodes)
Output Domain Real-valued dispatch signals (10 nodes)
Loss Function MSE with physics residual (Kirchhoff's law: sum of flows = 0)
Evaluation Method torch.no_grad() inference; classical baseline comparison
Known Limitations Synthetic training data; single-topology network; CPU-only inference
Bias Assessment Uniform random load generation; no demographic or geographic bias vectors
Intended Use Proof-of-concept for AmSC grid dispatch; not production grid control
Out-of-Scope Use Real-time grid control without human-in-the-loop validation

4. Model Card — AdversarialDetector

Field Value
Model Name AdversarialDetector v0.2.0
Model Type Rule-based heuristic classifier (non-ML)
Detection Patterns 8 regex-based injection patterns; 7 keyword-based poisoning patterns
Input Domain Arbitrary string payloads from workflow submissions
Output Domain Boolean alert with typed alert log
False Positive Rate Low — patterns target known adversarial signatures only
False Negative Rate Moderate — novel attack vectors may evade heuristic detection
Known Limitations Cannot detect semantic adversarial inputs; no ML-based anomaly detection
Bias Assessment Pattern-based; no training data bias; operates identically across all inputs
Intended Use First-line defense for payload validation
Mitigation for FN Defense-in-depth: Pydantic validation, VVUQ scoring, human review of STIX exports

5. Data Card

Field Value
Training Data Synthetically generated (no real-world data)
Data Source torch.rand() for grid loads; random.uniform() for thermal flows
PII/PHI Content None — no personal, health, or demographic data processed
Sensitive Categories None
Geographic Scope N/A — synthetic domain
Temporal Scope N/A — stateless per evaluation
Data Quality Controls Pydantic V2 strict-mode validation at API boundary
Data Retention In-memory only; no persistent storage of evaluation inputs
FAIR Compliance All schemas documented in OpenAPI 3.1; metadata includes domain, version, tags

6. Bias & Fairness Assessment

6.1 Applicable Bias Categories

Bias Type Applicability Assessment
Selection bias Low Synthetic data generated uniformly
Measurement bias Low Physics-based metrics are deterministic
Aggregation bias N/A No demographic subgroups in grid domain
Historical bias N/A No historical data utilized
Representation bias Low Synthetic generation covers parameter space uniformly
Automation bias Medium Risk that operators over-trust AI dispatch; mitigated by VVUQ threshold and human review

6.2 Mitigation Strategy for Automation Bias

  1. Mandatory VVUQ Scoring — No workflow output is delivered without a computed VVUQ score.
  2. Classical Baseline — Every AI dispatch is compared to a deterministic classical baseline; deviation is reported as AI-Advantage.
  3. Physics Violation Check — Violations cause absolute workflow failure regardless of AI confidence.
  4. Human-in-the-Loop — Audit ledger and STIX exports require human review before operational action.

7. Human Oversight Controls

Control Mechanism
Pre-deployment review All model weights and hyperparameters are inspectable in source code
Runtime intervention Health endpoint provides ledger integrity check; operators can halt operations
Post-evaluation audit STIX/TAXII 2.1 bundles exported for SOC/program-manager review
Override capability No autonomous execution — API responds to human-initiated requests only
Escalation path Physics violations and adversarial detections generate alert logs for human escalation

8. High-Risk AI Controls

Per Executive Order 14110 and DOE AI Principles:

Control Status
Safety testing before deployment ✅ 45 automated tests; VVUQ acceptance thresholds
Red-teaming for adversarial robustness ✅ 8 injection patterns + 7 poisoning patterns tested
Ongoing monitoring verify_chain() integrity check; health endpoint
Content provenance ✅ SHA-256 hash chain with genesis block
Watermarking / AI-generated content labeling N/A — no generative content produced
Dual-use risk assessment Low — energy grid optimization has no direct weapons application
Reporting to oversight bodies ✅ STIX/TAXII export for DOE and IC consumers

9. Continuous Improvement

Activity Frequency Owner
Update adversarial detection patterns Per threat-intelligence cycle Cybersecurity lead
Review VVUQ acceptance thresholds Per milestone delivery Physics lead
Retrain GridPINN on expanded topologies Per Topic 16 milestone ML engineer
Audit third-party dependencies Quarterly DevOps lead
Review this Responsible AI Plan Annually or upon regulatory change Program manager