Skip to content

Conversation

@JoelSpeed
Copy link
Contributor

@JoelSpeed JoelSpeed commented Apr 27, 2023

- What I did

Update library-go and openshift api.
Fixed up the type changes and the access to feature gates to use the new feature gate access defined in library-go.
Note this means feature gate enabled/disabled decisions are now made by CCO and not distributed by revendoring openshift/api everywhere.

Noticed a mistake in the scheme addKnownTypes, see comment I've left.

Details of feature gate changes in openshift/enhancements#1373

- How to verify it

E2E should pass, all units are passing too.

- Description for the changelog

@JoelSpeed
Copy link
Contributor Author

/test unit

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 27, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed into controllercontext and then passed into the controllers that need it when we call createControllers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for testing? if so, please name it as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, in the kubelet feature gate this is used to determine if there is a difference between the default features and the current set of features, only when there is a different is the feature gate kubelet config generated

@JoelSpeed JoelSpeed force-pushed the update-ccm-feature-gates branch from 8cd33cc to 0c56e5d Compare April 27, 2023 16:49
@JoelSpeed
Copy link
Contributor Author

/test unit
/test bootstrap-integration

Comment on lines 99 to 104
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controllers depend on the FeatureGate data so we should wait until they are synced before starting the controllers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Correct me if I'm wrong, but this has to match the value of version.Raw if it hasn't been substituted by a build time arg? (It currently doesn't, will fix)

Comment on lines -38 to -39
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be adding resources from other groups to the scheme here, it causes confusion for the decoder (and was breaking/masking issues in tests). The correct thing to fix this is configv1.Install(scheme.Scheme) at some point in the process that was complaining

Comment on lines 153 to 155
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installer will now always create a feature gate manifest, the MCO now has to depend on that to make decisions during the bootstrap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed into controllercontext and then passed into the controllers that need it when we call createControllers

Comment on lines +195 to +197
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some controllers don't use the fgAccess so it's ok for those ones to pass a nil through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably update this at some point to use the new configv1.FeatureGateName instead of a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all render configs will have the FGAccess, assume the featuregates are off in this case (shouldn't actually matter)

Comment on lines 28 to 33
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were removed from openshift/api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k any idea why this doesn't get set on a read?

@JoelSpeed JoelSpeed force-pushed the update-ccm-feature-gates branch from 0c56e5d to 8c032e1 Compare April 27, 2023 17:09
@JoelSpeed JoelSpeed marked this pull request as ready for review April 27, 2023 17:09
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2023
@openshift-ci openshift-ci bot requested review from jkyros and sinnykumari April 27, 2023 17:11
@JoelSpeed JoelSpeed changed the title Update Library-go and API for new featuregate changes [OCPCLOUD-2034] Update Library-go and API for new featuregate changes Apr 27, 2023
@JoelSpeed
Copy link
Contributor Author

/test bootstrap-unit

@JoelSpeed
Copy link
Contributor Author

Not expecting the e2e to pass until openshift/installer#6990 has merged

@JoelSpeed JoelSpeed force-pushed the update-ccm-feature-gates branch 2 times, most recently from e4ad6a7 to 9efe1b0 Compare May 2, 2023 15:09
@JoelSpeed
Copy link
Contributor Author

/retest-required

@JoelSpeed
Copy link
Contributor Author

So on upgrade, the MCO is logging some logs that may be useful,

E0503 11:03:57.982847       1 simple_featuregate_reader.go:290] cluster failed with : unable to determine features: missing desired version "machine-config-daemon-4.6.0-202006240615.p0-2058-gb0c56150-dirty" in featuregates.config.openshift.io/cluster

So the version injected for the MCO image in there does not appear to match the versions in the CVO, I wonder if this is a build time issue for CI jobs 🤔

@sinnykumari
Copy link
Contributor

To review kubelet-config related changes
/cc @rphillips

@sinnykumari
Copy link
Contributor

Retesting since openshift/installer#7160 has been merged now
/retest

@JoelSpeed JoelSpeed changed the title [OCPCLOUD-2034] Update Library-go and API for new featuregate changes OCPBUGS-13547: [OCPCLOUD-2034] Update Library-go and API for new featuregate changes May 17, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 17, 2023
@JoelSpeed
Copy link
Contributor Author

/test e2e-hypershift

More fixes merged into HyperShift this morning, let's see how this goes

@JoelSpeed
Copy link
Contributor Author

https://pr-payload-tests.ci.openshift.org/runs/ci/d6bc72d0-0377-11ee-9090-d9743d629fca-0

These both went green now! I'm dropping the EventedPLEG featuregate out of the payload in openshift/cluster-config-operator#317, once that's merged, I'll drop that final commit from my branch and I think we are good to go.

@sergiordlr Did you want to test the branch as it is now or wait until config operator is merged? They achieve the same thing so I think testing now should show the features issue we observed on friday is gone

@sinnykumari
Copy link
Contributor

sinnykumari commented Jun 5, 2023

All tech preview jobs are green! Also, openshift/cluster-config-operator#317 has been merged.

@JoelSpeed
Copy link
Contributor Author

Hypershift is green too! I'll work on getting the config operator change merged, but I think otherwise, once I drop that final commit, this is good to go

Anyone got any final reservations?

@JoelSpeed
Copy link
Contributor Author

Config operator has now merged, so, I've re-pushed what the actual branch should look like without my hack in place, this is ready to merge as far as I'm concerned now

@sinnykumari
Copy link
Contributor

Thanks Joel, good from my end.
@sergiordlr Can you do any final testing needed and add qe_approved label

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2023
@sinnykumari
Copy link
Contributor

Adding lgtm as well, techpreview and MCO tests are passing on the PR. We will get to know later if we we missed something. Thanks Joel for all the hard work!
/lgtm

Hold will be removed once qe has added qe-approved label. Since, master branch is not gated on qe laebl, we need to manually do this for pre-merge testing.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2023
Copy link
Member

@jkyros jkyros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Joel! I can't say enough about how much we appreciate your work here (vs us having to muddle through doing all of this on our own).

Things I feel are important to note here behavior-wise for the MCO team:

  • the machine-config-controller and machine-config-operator will both exit 0 immediately when feature gates change after the initial observation (this is probably good because it means we don't have to coordinate some sort of "are our operator and controller using the same set of feature gates" sort of thing)
  • the machine-config-daemon does not currently react to feature gates. While the operator/controller/daemon all share that same ctrlcommon.CreateControllerContext() that starts the featuregate controller, the daemon never starts the config informer that feeds it (and I don't think the daemon has the RBAC anyway) so it doesn't react to featureGates and won't get interrupted (this is good, right now it doesn't need to)

The MCO probably should at least follow up by:

  • migrating to klog and away from glog (rather than mix and match -- I assume klog should be the "winner" because it's better maintained)

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkyros, JoelSpeed, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor Author

/retest-required

@sergiordlr
Copy link
Contributor

sergiordlr commented Jun 6, 2023

Verified using IPI on GCP.

  1. All critical MCO test cases passed.

  2. TechPreviewNoUpgrade was enabled and there was no problem in the cluster. EventedPLEG remained disabled.

  status:
    featureGates:
    - disabled:
      - name: EventedPLEG

These tests passed after enabling the features:

  • 59867-Create files specifying user and group
  • 42365-add real time kernel
  • 54085-Update osImage
  1. CustomNoUpgrade is working properly
  spec:
    customNoUpgrade:
      enabled:
      - AdmissionWebhookMatchConditions
    featureSet: CustomNoUpgrad

We add the qe-approved label:

/label qe-approved

NOTE: We observed that after enabling TechPreviewNoUpgrade the MCO controller pod is restarted and it takes like 3 minutes to acquire the leader lease. Only in that case, if we remove the MCO controller pod the leader lease is acquired immediately. It is not considered a problem though.

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jun 6, 2023
@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Jun 6, 2023

QE approved, so can remove the hold now, thanks for your work on this @sergiordlr

@JoelSpeed
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2023

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-alibabacloud-ovn 3de0564 link false /test e2e-alibabacloud-ovn
ci/prow/okd-scos-e2e-gcp-ovn-upgrade 3de0564 link false /test okd-scos-e2e-gcp-ovn-upgrade
ci/prow/okd-scos-e2e-aws-ovn 3de0564 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Contributor Author

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit dadba5b into openshift:master Jun 6, 2023
@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: Jira Issue OCPBUGS-13547: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-13547 has not been moved to the MODIFIED state.

Details

In response to this:

- What I did

Update library-go and openshift api.
Fixed up the type changes and the access to feature gates to use the new feature gate access defined in library-go.
Note this means feature gate enabled/disabled decisions are now made by CCO and not distributed by revendoring openshift/api everywhere.

Noticed a mistake in the scheme addKnownTypes, see comment I've left.

Details of feature gate changes in openshift/enhancements#1373

- How to verify it

E2E should pass, all units are passing too.

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants