Skip to content

Conversation

@deads2k
Copy link
Contributor

@deads2k deads2k commented Mar 17, 2023

For openshift/enhancements#1373

I could probably live with this having to be vendored sometimes, but it would make me sad.

Part 5 of the three step plan to make individual featuregates observable via the API. This was previously avoided because when featuregates get promoted from TechPreview to Default, during an upgrade, an old operator may try to enable a TechPreview variant of a feature.

This builds on openshift/cluster-config-operator#288 with a goal of making the development-time flow

  1. update openshift/api with new featuregate
  2. vendor into near zero-dep openshfit/cluster-config-operator (automated someday?)
  3. done.

with a runtime flow of

  1. installer builds the featuregate.yaml with spec completed
  2. bootkube passes featuregate.yaml to cluster-config-operator render
  3. cluster-config-operator render completes status and overwrites the featuregate.yaml
  4. other operator in render receive featuregate.yaml and consume featuregates from that status
  5. cluster-bootstrap writes both the spec and status
  6. in the running cluster, cluster-config-operator consistently writes the featuregate.status

shortcomings

  • There is some generation I don't know how to do.
  • It relies on add featuregate status cluster-config-operator#288 to work and that hasn't merged
  • I don't know if I got the filename write
  • I don't know if a file can be overwritten during rendering
  • I don't know where the other input files come from
  • Technically I could leave this after the CVO and get 90% of the benefit. If we have ordering problems I will do so to get unstuck.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2023
@openshift-ci openshift-ci bot requested review from patrickdillon and r4f4 March 17, 2023 19:27
@patrickdillon
Copy link
Contributor

/assign

I will try to dig in to assist a bit more here and in openshift/cluster-config-operator#288

@JoelSpeed
Copy link
Contributor

Technically I could leave this after the CVO and get 90% of the benefit. If we have ordering problems I will do so to get unstuck.

I'm not sure this gotcha is true, if you leave it until after the CVO, there may be a diff between the FGs observed by the MCO on render and on first boot of the operator, if this happens, the cluster fails to bootstrap. You'd have to make sure that the FGs are consistent in the view of MCO until after it has rendered the first machine config into the cluster. The one that is rendered is not persisted in the API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file actually have to be called this? In other render commands I've seen they search through the manifests folder and look at a partial object meta to identify if the file is a featuregate. Pre the installer supporting feature gates, it could have been called whatever I liked, nothing to say that's not still the case for some who have set up clusters without FG support in the installer before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, spelling it out a bit more...

This PR will make the installer always lay down a feature set manifest. If there is a conflicting feature set manifest the installer will throw an error pre-install.

It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?

Yes. sometimes that's acceptable and the customer should know. I'm open to future refinement in 4.14, but I'd like to unstick azure ccm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to merge as is, I think we should probably check how many CI jobs rely on the ability to lay down a feature gate manifest blindly. I suspect it's non-zero and that this would break anyone currently relying on that mechanism

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick scan of the release repo leads me to:

Either we need to rename those to fit the pattern, or fix the logic to run a search in CCO of the manifest dir (the latter being the preferred long term IMO)

@deads2k
Copy link
Contributor Author

deads2k commented Apr 25, 2023

/test all

@deads2k deads2k changed the title [wip] pass featuregate args to config-operator to get rendered featuregates pass featuregate args to config-operator to get rendered featuregates Apr 25, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 25, 2023
Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sane to me.

There is some generation I don't know how to do.

Is this still an issue?

I don't know if I got the filename write

LGTM

I don't know if a file can be overwritten during rendering

The pattern in the PR should work well.

I don't know where the other input files come from

It doesn't seem like this is a blocker?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, spelling it out a bit more...

This PR will make the installer always lay down a feature set manifest. If there is a conflicting feature set manifest the installer will throw an error pre-install.

It would definitely be possible for a user to delete the installer-generated manifest. But in those cases only the feature set spec would be set and not the status, right?

@deads2k
Copy link
Contributor Author

deads2k commented Apr 25, 2023

I have pushed an update to set the VERSION, which will prevent flapping during install.

@JoelSpeed
Copy link
Contributor

/hold until we've clarified if this will break TPNU and CNU variants of CI jobs

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2023
@JoelSpeed
Copy link
Contributor

/hold cancel

CI jobs are fixed, we have PRs up for the CCO changes I wanted to see, I think this is ready

/lgtm

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 27, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 27, 2023
@smg247
Copy link
Member

smg247 commented Apr 27, 2023

/test all

@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

success will have

    - apiVersion: config.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:featureGates:
            .: {}
            k:{"version":"4.14.0-0.ci.test-2023-04-25-223643-ci-op-4mjy16rb-latest"}:
              .: {}
              f:disabled: {}
              f:enabled: {}
              f:version: {}
      manager: cluster-bootstrap
      operation: Update
      subresource: status

in featuregates (from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6990/pull-ci-openshift-installer-master-e2e-aws-ovn/1650960438084505600)

The current failed

    - apiVersion: config.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status: {}
      manager: cluster-bootstrap
      operation: Update
      subresource: status
      time: "2023-04-27T21:48:16Z"
    - apiVersion: config.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:featureGates:
            .: {}
            k:{"version":"4.14.0-0.ci.test-2023-04-27-213820-ci-op-zfiwkxdz-latest"}:
              .: {}
              f:disabled: {}
              f:enabled: {}
              f:version: {}
      manager: cluster-config-operator
      operation: Update
      subresource: status
      time: "2023-04-27T21:52:20Z"

the cluster-config-operator had to overwrite it in https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/6990/pull-ci-openshift-installer-master-e2e-aws-ovn/1651685466501550080

I need to figure out whether it was the version arg or something else

@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

/hold until I get that worked out.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

/hold cancel

this one is back to installing correctly. I'd like to unstick the external cloud provider with this and continue to work on handling different filenames later.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2023
@deads2k deads2k added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Apr 28, 2023
@deads2k
Copy link
Contributor Author

deads2k commented Apr 28, 2023

bug is the extgernal cloud provider revert.

@patrickdillon
Copy link
Contributor

/retest

@patrickdillon
Copy link
Contributor

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2023
@patrickdillon
Copy link
Contributor

/skip
hopefully this will abort the unnecessary retest

@deads2k deads2k added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 28, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2ffd73e and 2 for PR HEAD 07da028 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 29, 2023

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-upgrade dec02f912d5d077824bacd5b4adc718002a8c533 link false /test okd-scos-e2e-aws-upgrade
ci/prow/e2e-aws-ovn-workers-rhel8 dec02f912d5d077824bacd5b4adc718002a8c533 link false /test e2e-aws-ovn-workers-rhel8
ci/prow/e2e-aws-ovn-upgrade dec02f912d5d077824bacd5b4adc718002a8c533 link false /test e2e-aws-ovn-upgrade
ci/prow/okd-e2e-aws-ovn 07da028 link false /test okd-e2e-aws-ovn
ci/prow/okd-e2e-aws-ovn-upgrade 07da028 link false /test okd-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 8e8fb72 into openshift:master Apr 29, 2023
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-installer-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-installer.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-baremetal-installer-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-baremetal-installer.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-installer-artifacts-container-v4.14.0-202304290741.p0.g8e8fb72.assembly.stream for distgit ose-installer-artifacts.
All builds following this will include this PR.

sf-project-io pushed a commit to redhat-cip/dci-openshift-agent that referenced this pull request May 10, 2023
A change to allow feature gates in the openshift installer[1] introduced
the use of oc adm release info in the bootkube service, breaking
disconnected environments. This workaround overrides the install release
image to use the release image from the local registry.

[1] openshift/installer#6990

Change-Id: I42a47d7f8ad175d07f2469e17178e3d6f34e6d11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants