Skip to content

feat(evidence): add NIM support to evidence collection and restructure conformance docs#479

Merged
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/nim-evidence-collection
Apr 2, 2026
Merged

feat(evidence): add NIM support to evidence collection and restructure conformance docs#479
mchmarny merged 1 commit intoNVIDIA:mainfrom
yuanchen8911:feat/nim-evidence-collection

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

  • Add NIM inference and NIM Operator paths to evidence collection script
  • Restructure conformance docs to v1.35/nim-eks/ (mirrors CNCF submission repo)
  • Update all docs for NIM on EKS as the certified product

Changes

Evidence collection (pkg/evidence/scripts/collect-evidence.sh):

  • Add collect_service_metrics_nim() — creates ServiceMonitor, sends inference requests, waits for Prometheus to scrape with health: up, captures metrics from both NIM /v1/metrics endpoint and Prometheus queries
  • Add collect_operator_nim() — validates NIM Operator deployment, CRDs (apps.nvidia.com), webhooks (namespace-based lookup), NIMService reconciliation, webhook rejection test
  • Detection priority: Dynamo > NIM Operator > Kubeflow Trainer (for both metrics and operator checks)

Documentation restructure:

  • Move from evidence/ + submission/ to v1.35/nim-eks/ structure (mirrors cncf/k8s-ai-conformance/v1.35/<product>/)
  • Supports future submissions (e.g., v1.35/nim-gke/, v1.36/nim-eks/)
  • Update all internal links and PRODUCT.yaml evidence URLs

PRODUCT.yaml updates:

  • platformName: "NVIDIA NIM on EKS"
  • websiteUrl: developer.nvidia.com/nim
  • repoUrl: github.com/NVIDIA/k8s-nim-operator
  • documentationUrl: NIM Helm deploy docs
  • NIM-specific notes for ai_service_metrics and robust_controller requirements

Test plan

  • Evidence collection produces 9/9 PASS on EKS with NIM workload
  • AI Service Metrics: Prometheus discovers NIM target with health: up via ServiceMonitor
  • Robust Operator: NIM Operator webhooks found, NIMService reconciled, rejection test passes
  • All links verified (README → evidence/, PRODUCT.yaml → full GitHub URLs)
  • No stale references to old evidence/ or submission/ paths

Depends on #478

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner April 1, 2026 23:24
@yuanchen8911 yuanchen8911 added enhancement New feature or request area/docs labels Apr 1, 2026
@yuanchen8911 yuanchen8911 requested review from dims and mchmarny April 1, 2026 23:31
Copy link
Copy Markdown
Collaborator

@dims dims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mchmarny mchmarny enabled auto-merge (squash) April 2, 2026 11:30
…formance docs

Add NIM Operator and NIM inference metrics paths to evidence collection,
and update all conformance documentation to reflect NIM on EKS as the
certified product.

Evidence collection:
- Add collect_service_metrics_nim() for NIM /v1/metrics endpoint
- Add collect_operator_nim() for NIM Operator CRDs/webhooks/reconciliation
- Detection priority: Dynamo > NIM Operator > Kubeflow Trainer

Documentation:
- Update PRODUCT.yaml platform to "NVIDIA NIM on EKS"
- Update submission README and evidence index for NIM
- Refresh all 9 evidence files with NIM-based conformance results (9/9 PASS)
@yuanchen8911 yuanchen8911 force-pushed the feat/nim-evidence-collection branch from 2e5e120 to 54fb0ac Compare April 2, 2026 15:28
@mchmarny mchmarny merged commit f2aeaf2 into NVIDIA:main Apr 2, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants