-
Notifications
You must be signed in to change notification settings - Fork 1.5k
pass featuregate args to config-operator to get rendered featuregates #6990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/assign I will try to dig in to assist a bit more here and in openshift/cluster-config-operator#288 |
I'm not sure this gotcha is true, if you leave it until after the CVO, there may be a diff between the FGs observed by the MCO on render and on first boot of the operator, if this happens, the cluster fails to bootstrap. You'd have to make sure that the FGs are consistent in the view of MCO until after it has rendered the first machine config into the cluster. The one that is rendered is not persisted in the API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file actually have to be called this? In other render commands I've seen they search through the manifests folder and look at a partial object meta to identify if the file is a featuregate. Pre the installer supporting feature gates, it could have been called whatever I liked, nothing to say that's not still the case for some who have set up clusters without FG support in the installer before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, spelling it out a bit more...
This PR will make the installer always lay down a feature set manifest. If there is a conflicting feature set manifest the installer will throw an error pre-install.
It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?
Yes. sometimes that's acceptable and the customer should know. I'm open to future refinement in 4.14, but I'd like to unstick azure ccm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to merge as is, I think we should probably check how many CI jobs rely on the ability to lay down a feature gate manifest blindly. I suspect it's non-zero and that this would break anyone currently relying on that mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick scan of the release repo leads me to:
- https://github.com/openshift/release/blob/bb21ccca00ad923ea66912c5897335d48087851d/ci-operator/step-registry/ipi/conf/techpreview/ipi-conf-techpreview-commands.sh#L4
- https://github.com/openshift/release/blob/bb21ccca00ad923ea66912c5897335d48087851d/ci-operator/step-registry/ipi/conf/customnoupgrade/ipi-conf-customnoupgrade-commands.sh#L14
Both of which, IIUC, set up a feature gate manifest under a custom name during the openshift-install manifest stage. The current logic wouldn't pick these up and would invalidate all techpreview and customnoupgrade CI jobs we have.
Either we need to rename those to fit the pattern, or fix the logic to run a search in CCO of the manifest dir (the latter being the preferred long term IMO)
|
/test all |
patrickdillon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sane to me.
There is some generation I don't know how to do.
Is this still an issue?
I don't know if I got the filename write
LGTM
I don't know if a file can be overwritten during rendering
The pattern in the PR should work well.
I don't know where the other input files come from
It doesn't seem like this is a blocker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, spelling it out a bit more...
This PR will make the installer always lay down a feature set manifest. If there is a conflicting feature set manifest the installer will throw an error pre-install.
It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?
|
I have pushed an update to set the VERSION, which will prevent flapping during install. |
|
/hold until we've clarified if this will break TPNU and CNU variants of CI jobs |
|
/hold cancel CI jobs are fixed, we have PRs up for the CCO changes I wanted to see, I think this is ready /lgtm |
|
/test all |
|
success will have in featuregates (from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6990/pull-ci-openshift-installer-master-e2e-aws-ovn/1650960438084505600) The current failed the cluster-config-operator had to overwrite it in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6990/pull-ci-openshift-installer-master-e2e-aws-ovn/1651685466501550080 I need to figure out whether it was the version arg or something else |
|
/hold until I get that worked out. |
|
/retest |
|
/retest |
|
/hold cancel this one is back to installing correctly. I'd like to unstick the external cloud provider with this and continue to work on handling different filenames later. |
|
bug is the extgernal cloud provider revert. |
|
/retest |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/skip |
|
@deads2k: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-installer-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-installer. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-baremetal-installer-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-baremetal-installer. |
|
[ART PR BUILD NOTIFIER] This PR has been included in build ose-installer-artifacts-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-installer-artifacts. |
…nder CI is currently borken https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1653511338548269056/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/namespaces/e2e-clusters-glkxm-example-qmb9l/core/pods/logs/kube-apiserver-79b6c48d7-mzv44-init-bootstrap-previous.log. This introduced a new command which is effectively required openshift/cluster-config-operator#288 Here it was added to the installer github.com/openshift/installer/pull/6990
…nder CI is currently borken https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1653511338548269056/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/namespaces/e2e-clusters-glkxm-example-qmb9l/core/pods/logs/kube-apiserver-79b6c48d7-mzv44-init-bootstrap-previous.log. This introduced a new command which is effectively required openshift/cluster-config-operator#288 Here it was added to the installer github.com/openshift/installer/pull/6990
…nder CI is currently borken https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1653511338548269056/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/namespaces/e2e-clusters-glkxm-example-qmb9l/core/pods/logs/kube-apiserver-79b6c48d7-mzv44-init-bootstrap-previous.log. This introduced a new command which is effectively required openshift/cluster-config-operator#288 Here it was added to the installer github.com/openshift/installer/pull/6990
A change to allow feature gates in the openshift installer[1] introduced the use of oc adm release info in the bootkube service, breaking disconnected environments. This workaround overrides the install release image to use the release image from the local registry. [1] openshift/installer#6990 Change-Id: I42a47d7f8ad175d07f2469e17178e3d6f34e6d11
For openshift/enhancements#1373
I could probably live with this having to be vendored sometimes, but it would make me sad.
Part 5 of the three step plan to make individual featuregates observable via the API. This was previously avoided because when featuregates get promoted from TechPreview to Default, during an upgrade, an old operator may try to enable a TechPreview variant of a feature.
This builds on openshift/cluster-config-operator#288 with a goal of making the development-time flow
with a runtime flow of
shortcomings