Skip to content

Conversation

@JoelSpeed
Copy link
Contributor

As part of the Cloud Controller Manager project, we are making some significant changes that will at first, be under the TechPreviewNoUpgrade feature set.
To gain confidence in the new cloud controller managers, we would like to add some signal to the release informing jobs that the TPNU clusters are still viable and healthy.

This PR adds periodics for the parallel and serial jobs for AWS, Azure and OpenStack, the three platforms we are initially targeting for the CCM project.

/assign @deads2k

@JoelSpeed
Copy link
Contributor Author

/hold

This isn't actually doing what I want it to do right now, it is putting the FG in after install where we need it in the manifests before install

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2021
@JoelSpeed JoelSpeed force-pushed the add-tpnu-step branch 4 times, most recently from 801dfba to f61962a Compare July 5, 2021 13:57
@JoelSpeed
Copy link
Contributor Author

/retest

@JoelSpeed
Copy link
Contributor Author

/retest

Build01 was broken yesterday so the jobs weren't starting correctly, I'm told this is now fixed

@JoelSpeed
Copy link
Contributor Author

Looks like this is now working as expected. The tests have failed due to a known issue with the MCO, grabbed the below from the machine-config ClusterOperator

"extension": {
                    "master": "pool is degraded because nodes fail with \"3 nodes are reporting degraded status on sync\": \"Node ip-10-0-172-141.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-0453ee164ce7ad45cc973e0bae480a25\\\\\\\" not found\\\", Node ip-10-0-139-14.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-0453ee164ce7ad45cc973e0bae480a25\\\\\\\" not found\\\", Node ip-10-0-210-207.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-0453ee164ce7ad45cc973e0bae480a25\\\\\\\" not found\\\"\"",
                    "worker": "all 3 nodes are at latest configuration rendered-worker-2b21538e8756e810dba78aa2edace63d"
                },

I believe the MCO team have a fix in place for this that is currently under review

@JoelSpeed
Copy link
Contributor Author

/hold cancel

This is now working as intended, but we know this is broken for now

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2021
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not speaking cron, how frequent is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is every 8 hours, copied from the existing openstack jobs which this would synchronise with. Will talk to the release team about why openstack is so frequent and whether being in sync is going to cause issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k
Copy link
Contributor

deads2k commented Jul 6, 2021

Not sure I have rights here actually, let's try.

/approve

@stbenjam
Copy link
Member

stbenjam commented Jul 6, 2021

  • Do you only want them running on cron or do you want feedback on each nightly release? If the latter you probably want to add them to the release controller as optional informers: https://docs.ci.openshift.org/docs/architecture/release-gating/

  • The OWNERS files for the steps seem to be missing the PR author, should you be including yourself to maintain these new steps?

Copy link
Member

@stbenjam stbenjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments on each of the new steps, chains and workflows should be updated to reflect their actual purposes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to be incorrect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step documentation doesn't seem to be accurate as to the purpose of this chain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is parallel in the test name? It's assumed that's what's running unless otherwise specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied this from the test above, e2e-openstack-parallel on line 221 in this diff. Not sure why it's called that but this is just keeping consistency with the rest of the openstack tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: azure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No serial test-suite for OpenStack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For openstack, this appears to be overriden by an environment variable within the job configuration itself. I'm just trying to be consistent with existing openstack jobs by copying that pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, so we just have periodics for serial suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have serial and the regular tests (parallel?), the block above this should be for the openstack regular tests

Copy link
Contributor Author

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will go through and fix up all the comments on the files today, that was just an omission while I was copy/pasting

Do you only want them running on cron or do you want feedback on each nightly release? If the latter you probably want to add them to the release controller as optional informers: https://docs.ci.openshift.org/docs/architecture/release-gating/

I'll look into that link thank you, I think we want both the cron and release optional informers

The OWNERS files for the steps seem to be missing the PR author, should you be including yourself to maintain these new steps?

I deliberately didn't do that initially since this isn't really my remit. I've been asked to add these by archs because we are making the tech preview more interesting this release (with the CCMs), so it was suggested that it would be good in general to have this, rather than it being something specifically tied to the CCM project which I own. My thought here was that this should be owned in general by the release teams who own the rest of the release informing jobs.

If that's not the correct assumption, I'll update the PR to include myself as an owner on all of these as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied this from the test above, e2e-openstack-parallel on line 221 in this diff. Not sure why it's called that but this is just keeping consistency with the rest of the openstack tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have serial and the regular tests (parallel?), the block above this should be for the openstack regular tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For openstack, this appears to be overriden by an environment variable within the job configuration itself. I'm just trying to be consistent with existing openstack jobs by copying that pattern

@JoelSpeed
Copy link
Contributor Author

@stbenjam @ravisantoshgudimetla I think I've addressed and replied to all the comments as appropriate, PTAL when you have a moment

@JoelSpeed
Copy link
Contributor Author

The MCO failure here will be fixed once openshift/machine-config-operator#2668 is merged

@JoelSpeed
Copy link
Contributor Author

Latest results show that the storage issues are gone now the rebase has landed. But does now highlight that there are two tests that fail because of alerts fired because we are running as TechPreview. I'll raise PRs to follow up on getting those alerts ignored when in tech preview mode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is specific for CCM, this PR is adding generic TechPreviewNoUpgrade tests to prove out TP features in general and in the future, CCM just happens to be a subset of that for now

@JoelSpeed
Copy link
Contributor Author

/retest

1 similar comment
@JoelSpeed
Copy link
Contributor Author

/retest

@JoelSpeed
Copy link
Contributor Author

Test failures now seem to be unrelated to the TechPreview testing and may actually be genuine problems/flakes. I think this is ready to go in now.

The Azure and OpenStack serials have passed already which is a good sign :)

One more test for luck
/retest

@stbenjam
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2021
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess inherited from master, but ideally each directory will have more than one approver. Can we find anyone else who can sign up to help maintain this chain going forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just copied this from the parent folder, perhaps @abhinavdahiya has a suggestion for someone else who can help own the azure configs? Perhaps someone from splat might be able to help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be a chain gathering up some of this, to reduce the chance that someone adds a new OpenStack knob similar to the existing FIPS, but forgets to update this tech-preview chain to respect that knob here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I'm not an expert in these ci-operator configs, but I couldn't see a common chain, I'll have a look and see if there's a chain that contains this stuff, if not, perhaps a separate PR would be appropriate to merge them and update all the affected jobs?

This one is based off the existing chains, just adding the techpreview step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't appear to be anything today that I can find that resembles a shared chain for this stuff, have poked the ShiftStack team regarding the issue to see if this is something on their radar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierreprinetti suggested that we can actually avoid this duplcation by not using the ipi-conf like this and just directly adding the techpreview step into the workflow. Since the AWS version of this is already merged, #21227 is a PR which cleans it up to show what Pierre meant, do you think this is a better approach that I should adopt for all of these?

@stbenjam
Copy link
Member

Looks like you'll need someone like @jupierce or @bradmwilliams for final approval

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2021
@JoelSpeed
Copy link
Contributor Author

Just rebased to fix conflicts, someone updated the master periodics to move them from build01 to build02

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 24, 2021

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/rehearse/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-techpreview bcb1e12 link /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-techpreview-serial bcb1e12 link /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-techpreview-serial bcb1e12 link /test pj-rehearse
ci/rehearse/periodic-ci-openshift-release-master-ci-4.9-e2e-openstack-techpreview-serial bcb1e12 link /test pj-rehearse
ci/prow/pj-rehearse bcb1e12 link /test pj-rehearse

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 24, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bradmwilliams, deads2k, JoelSpeed, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2021
@openshift-merge-robot openshift-merge-robot merged commit 551fe7d into openshift:master Aug 24, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 24, 2021

@JoelSpeed: Updated the following 3 configmaps:

  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master__ci-4.9.yaml using file ci-operator/config/openshift/release/openshift-release-master__ci-4.9.yaml
  • job-config-master configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-release-master-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-master-periodics.yaml
  • step-registry configmap in namespace ci at cluster app.ci using the following files:
    • key OWNERS using file ci-operator/step-registry/ipi/azure/pre/techpreview/OWNERS
    • key ipi-azure-pre-techpreview-chain.metadata.json using file ci-operator/step-registry/ipi/azure/pre/techpreview/ipi-azure-pre-techpreview-chain.metadata.json
    • key ipi-azure-pre-techpreview-chain.yaml using file ci-operator/step-registry/ipi/azure/pre/techpreview/ipi-azure-pre-techpreview-chain.yaml
    • key OWNERS using file ci-operator/step-registry/ipi/conf/azure/techpreview/OWNERS
    • key ipi-conf-azure-techpreview-chain.metadata.json using file ci-operator/step-registry/ipi/conf/azure/techpreview/ipi-conf-azure-techpreview-chain.metadata.json
    • key ipi-conf-azure-techpreview-chain.yaml using file ci-operator/step-registry/ipi/conf/azure/techpreview/ipi-conf-azure-techpreview-chain.yaml
    • key OWNERS using file ci-operator/step-registry/ipi/conf/openstack/techpreview/OWNERS
    • key ipi-conf-openstack-techpreview-chain.metadata.json using file ci-operator/step-registry/ipi/conf/openstack/techpreview/ipi-conf-openstack-techpreview-chain.metadata.json
    • key ipi-conf-openstack-techpreview-chain.yaml using file ci-operator/step-registry/ipi/conf/openstack/techpreview/ipi-conf-openstack-techpreview-chain.yaml
    • key OWNERS using file ci-operator/step-registry/ipi/openstack/pre/techpreview/OWNERS
    • key ipi-openstack-pre-techpreview-chain.metadata.json using file ci-operator/step-registry/ipi/openstack/pre/techpreview/ipi-openstack-pre-techpreview-chain.metadata.json
    • key ipi-openstack-pre-techpreview-chain.yaml using file ci-operator/step-registry/ipi/openstack/pre/techpreview/ipi-openstack-pre-techpreview-chain.yaml
    • key OWNERS using file ci-operator/step-registry/openshift/e2e/azure/techpreview/OWNERS
    • key openshift-e2e-azure-techpreview-workflow.metadata.json using file ci-operator/step-registry/openshift/e2e/azure/techpreview/openshift-e2e-azure-techpreview-workflow.metadata.json
    • key openshift-e2e-azure-techpreview-workflow.yaml using file ci-operator/step-registry/openshift/e2e/azure/techpreview/openshift-e2e-azure-techpreview-workflow.yaml
    • key OWNERS using file ci-operator/step-registry/openshift/e2e/azure/techpreview/serial/OWNERS
    • key openshift-e2e-azure-techpreview-serial-workflow.metadata.json using file ci-operator/step-registry/openshift/e2e/azure/techpreview/serial/openshift-e2e-azure-techpreview-serial-workflow.metadata.json
    • key openshift-e2e-azure-techpreview-serial-workflow.yaml using file ci-operator/step-registry/openshift/e2e/azure/techpreview/serial/openshift-e2e-azure-techpreview-serial-workflow.yaml
    • key OWNERS using file ci-operator/step-registry/openshift/e2e/openstack/techpreview/OWNERS
    • key openshift-e2e-openstack-techpreview-workflow.metadata.json using file ci-operator/step-registry/openshift/e2e/openstack/techpreview/openshift-e2e-openstack-techpreview-workflow.metadata.json
    • key openshift-e2e-openstack-techpreview-workflow.yaml using file ci-operator/step-registry/openshift/e2e/openstack/techpreview/openshift-e2e-openstack-techpreview-workflow.yaml
Details

In response to this:

As part of the Cloud Controller Manager project, we are making some significant changes that will at first, be under the TechPreviewNoUpgrade feature set.
To gain confidence in the new cloud controller managers, we would like to add some signal to the release informing jobs that the TPNU clusters are still viable and healthy.

This PR adds periodics for the parallel and serial jobs for AWS, Azure and OpenStack, the three platforms we are initially targeting for the CCM project.

/assign @deads2k

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants