-
Notifications
You must be signed in to change notification settings - Fork 11
Operator/rhoai #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Operator/rhoai #97
Conversation
- Added "Quick Start from Source" and updated "Quick Start (OCI)" for clarity. - Included a new section for "Adding New Operators" to guide contributions. - Revised development workflow steps for better clarity and organization.
rhoai branch
Refactor RHOAI policy documentation and values.yaml.
…uired operators for RHOAI 3.0, including GitOps, ACM, Serverless, Service Mesh, Pipelines, Node Feature Discovery, and NVIDIA GPU Operator.
GPU operator
…rator policy to enforce OwnNamespace install mode, while cleaning up RHOAI policy configuration by removing unnecessary fields.
Update .gitignore to exclude my-values.yaml and modify NVIDIA GPU ope…
- Introduced new dashboard configuration options in values.yaml for tracking, model features, GenAI features, and notebook controller settings. - Updated policy-rhoai-config.yaml to enforce the new dashboard settings through a ConfigurationPolicy for OdhDashboardConfig.
Add OdhDashboardConfig settings to RHOAI policy
- Added detailed steps for installing OpenShift GitOps and Red Hat ACM operators as prerequisites for deploying AutoShift. - Updated deployment steps to include configuring OCI registry credentials, applying cluster labels, and verifying deployment. - Introduced new ConfigurationPolicies for KnativeServing and RHOAI dashboard route in policy-rhoai-config.yaml. - Updated values.yaml for serverless and servicemesh3 policies to include configurations for KnativeServing and Istio.
RHOAI policies
…ft and NVIDIA GPU operator - Corrected the OCI registry path for AutoShift installation and updated the values file reference. - Clarified steps for moving clusters to the 'hub' ClusterSet and applying AutoShift labels for operator installation. - Enhanced NVIDIA GPU operator policy documentation to emphasize the requirement of Node Feature Discovery (NFD) for GPU detection. - Updated values.yaml and policy templates to reflect changes in GPU operator configurations and dependencies.
Update deployment documentation and policy configurations for AutoShi…
- Added new ConfigurationPolicies for DataScienceCluster initialization and management, including `rhoai-dsc-bootstrap` and `rhoai-dsc`. - Updated README.md to include detailed descriptions of new ConfigurationPolicies and their roles in RHOAI 3.0. - Included troubleshooting steps for components stuck in "Removed" state and clarified management state handling for DataScienceCluster. - Expanded values.yaml to incorporate new dashboard configuration options and model registry settings.
Enhance RHOAI policy documentation and configuration
andymiller96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rough first pass at PR review
Untitled
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't commit a file named "Untitled" in the base directory. Or anywhere for that matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed I missed that :(
| istio: | ||
| name: default | ||
| namespace: istio-system | ||
| version: "v1.26.2" # Must match servicemesh3 operator version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we hard-coding a version? Same for line 29.
| name: servicemeshoperator3 | ||
| namespace: openshift-operators | ||
| channel: stable | ||
| channel: stable-3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to default to 3.2 when it's overridden by the template
example here:
autoshiftv2/autoshift/values.hub.yaml
Line 43 in 605402c
| servicemesh3-channel: stable-3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just leave it blank/comment it out or does it affect the install if the hub.yaml value over rides the values.yaml.
…d file and update values.hub.yaml to include commented-out sections for Istio and GPU operator settings.
|
Let's kill the helm charts if there's an equivalent OperatorPolicy for those operators. @hultzj |
andymiller96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove unnecessary helm charts
- Move quickstart to its own .md
- Update CNI namespace
- Address other random comments (versions, etc.)
- Have Ben review the stuff he's tagged in
| name: {{ .Values.servicemesh3.istioCni.name | default "default" }} | ||
| spec: | ||
| version: {{ .Values.servicemesh3.istioCni.version | default "v1.24.3" | quote }} | ||
| namespace: {{ .Values.servicemesh3.istio.namespace | default "istio-system" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
istio-cni should be in its own namespace, typically "istio-cni"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move from quickstart-oci to something like quickstart-rhoai.
platform version update from 4.18 to 4.20 probably needs to happen all across the codebase as its own PR @bcarr-rh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that is on my todo list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this breaks the nodeFeatureDiscovery. I think if you dont use it you can just disable it in a rhoai values file
|
|
||
| ### Development | ||
| - **[Developer Guide](developer-guide.md)** - Contributing to AutoShift and advanced configuration | ||
| - **[Adding New Operators](adding-new-operators.md)** - Step-by-step guide to add operators and contribute upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the Developer Guide are meant to be the same thing
|
|
||
| ## Deployment Steps | ||
|
|
||
| ### Step 1: Configure OCI Registry Credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oci deployment should all be using oci not the local helm at all. The bootstraps are in oci as well and there should be scripts to deploy bootstrap with oci
| ``` | ||
|
|
||
| ### Step 3: Verify Deployment | ||
| ### Step 5: Move Cluster to Hub ClusterSet (Required) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The labels are actually applied to the cluster through a labels policy and shouldnt be applied manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that is on my todo list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this breaks the nodeFeatureDiscovery. I think if you dont use it you can just disable it in a rhoai values file
| operatorGroupName: openshift-nfd-operator | ||
| targetNamespaces: # Target namespaces for namespace-scoped operators | ||
| - openshift-nfd | ||
| operandImage: 'registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.18' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
| @@ -0,0 +1,100 @@ | |||
| # Default values for nvidia-gpu | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should any of these new values be configurable per cluster? If so you may want to have these as defaults and add new labels
No description provided.