-
Notifications
You must be signed in to change notification settings - Fork 465
Add bootstrap vs day 2 integration tests based on envtest #2687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bootstrap vs day 2 integration tests based on envtest #2687
Conversation
|
My main question would be given: Are we already saying this is going to fail? (Maybe I'm misreading - please correct me) Generally, I'm really concerned about the number of e2e tests being added specifically to the MCO repo that don't really pass or give good signal or actionable things to the MCO team. Do you plan on monitoring the e2e? Would this be better suited in a periodic? We currently have 19 things running on every PR in the repo and this generally seems like it's too much at this point (generally not specific to this test). |
|
I completely understand your concern! In terms of does this fail today, yes. But, I've written this test with 2668 and 2547 in mind. The idea, based on my conversation with Ryan and Jerry, is that this should pass as soon as both of those PRs are merged. If these ever fail in the future, either the test needs updating for some legitimate reason, or, MCO broke its day 1 to day 2 compatibility. There are several options for this:
I'll set up a branch tomorrow that includes this, 2547 and 2668, and verify that the tests do indeed all pass :) |
test/e2e-bootstrap/bootstrap_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear to me, are these individual test cases(applying manifests) running during bootstrap and we are later on comparing rendered config of bootstrap version and rendered config available during real cluster. How we are ensuring that each manifests are getting applied with fresh bootstrap process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better explained via a call, does your team have an office hours I could attend to demo this test and explain properly how it works?
The envtest tool I've set up in this PR runs an etcd and kubeapiserver locally on your machine (or in CI), so there isn't actually a real cluster per se.
This test suite runs the controllers from the MCC against this local kubeapiserver. It then creates all the resources that would normally be created during bootstrap, which the MCC controllers react to, and render the config as they would do in a real cluster.
Alongside this, we take the same resources and put them into a temp folder, which we then run the bootstrap against.
Because this isn't a real cluster, and the only controllers running are the MCC controllers, we then shut down the controllers between each test and restart them. And we clean the environment by deleting all of the resources that we created.
One of the really cool things about this style of testing is you can see how your controller reacts to real API resources, without fakes, without a real cluster :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. We don't have office hours scheduled so far, we can plan a meeting but that can earliest happen next week.
not sure, I think merging actual code flow PR with passing test is better as it would catch regression early.
If I understand correctly, this test adds e2e coverage during bootstrap process for issues that we have seen in several occasion i.e. rendered-config mismatch from bootstrap and actual cluster. If my understanding is correct, I find this test useful in general and more coverage(from encountered bugs) can be added by adding additional test cases.
+1, that will provide some real data. I am wondering if we add this new test, do we really need |
I think we are on the same page here based on this comment :)
Here we go https://github.com/JoelSpeed/machine-config-operator/tree/bootstrap-e2e%2B2547%2B2668, this branch is 2668, then 2547, then this PR, plus 2 extra commits. The first extra commit, I noticed in other controllers we default the feature gate if it's nil, that's not the case in bootstrap, so to avoid panic, I allowed the feature map generation to handle a nil. This could come as part of 2547 IMO. The second extra commit integrates the refactoring done in 2668 into 2547 and allows it to pass the feature gate through, fixing the compatibility with the tests from the feature gate perspective. I've uploaded the output of a test run to a gist. From this, we can see that the first two tests pass (ie no extra manifests and a feature gate only), but then the 3rd and 4th tests fail. In the 3rd test, we expect the kubelet config to only attach to the masters, but it is attaching to the master and worker during bootstrap, hence the worker configs differ, which is a known issue that @yuqi-zhang pointed out with 2547. Same thing is happening in the 4th test except we expect it only in the worker config. Once the label matching element of 2547 is fixed, I expect these tests should pass without issue. |
That is completely up to your team. I added that one first as it was an easy add to prove 2668 fixed cluster bootstrap, I think this test is better in the long term (as you can run it locally, it gives you full diffs, and doesn't need a cluster to run against. But I appreciate this is new and takes more time to review and understand, so I left this as the second part of the testing :) |
c78319b to
22635e2
Compare
|
@sinnykumari @kikisdeliveryservice Now that 4.10 is open for business, I've just rebased this, hoping we can merge soon if you both think this is a valid addition to the test suite, can't remember where we left it after the call at the end of July |
|
@sinnykumari What would you like to do with this PR? Is this useful still? |
Sorry, it got missed out. yup, will be nice to have this test in MCO. |
22635e2 to
20eb5b0
Compare
|
Thanks @sinnykumari, rebased |
hack/fetch-ext-bins.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is unused variable in the test
sinnykumari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added few comments, rest lgtm
20eb5b0 to
5edae9d
Compare
JoelSpeed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @sinnykumari, think I've addressed the comments. I've added a filter to pick out the not found machine config names in the helper in an additional commit on top, happy to squash that into another commit if you think that's appropriate
sinnykumari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Joel for getting this test added and helping team understand the envtest.
LGTM.
@kikisdeliveryservice @yuqi-zhang any final thoughts before we merge this?
|
Also the CI job appears to be failing with |
While we work on the new e2e tests in: openshift/machine-config-operator#2687 set always_run to false to avoid having extra noise in the repo.
4570e05 to
1c115a7
Compare
|
/test bootstrap-e2e |
|
/test bootstrap-e2e |
|
@yuqi-zhang I found this little gem machine-config-operator/hack/build-go.sh Lines 38 to 44 in 7615a61
CGO_ENABLED=0 and GOTAGS in the make task for the bootstrap-e2e, which seems to have resolved the issue with extra C dependencies
|
When the HOME directory is unset or is set to /, kubebuilder tries to create a cache directory but fails. Make sure in these cases that we set it to a temp folder so that the test can create its cache
|
/test bootstrap-e2e |
|
Had to do a couple of fixups to the scripts (stuff we had already encountered but I hadn't copied over), but the tests are now working, https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2687/pull-ci-openshift-machine-config-operator-master-bootstrap-e2e/1452580424227229696 |
|
@JoelSpeed: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Ah ok I completely forgot about that snippet, good catch Just to make sure, I tested one more time locally. I presume that is expected because it's lacking the Overall lgtm, will also let Sinny take a final look |
|
LGTM, Log from bootstrap-e2e test looks good. Let's get this in and we can discuss with Joel offline if needed help with running locally. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, sinnykumari, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
I have the binaries required for this (etcd, kube-apiserver) installed locally on my machine as I use this style of testing in a lot of places. With |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
- What I did
Created a new test suite using a a controller-runtime tool envtest, note the new dependency is just so we can use this envtest tool for these tests, no other packages are being imported.
With envtest, the new test suite in
test/e2e-bootstrap/bootstrap_test.gosets up the controllers from the MCC against a running kubeapi/etcd (provided by envtest).It then performs the following steps:
pkg/controller/bootstrap/testdata/bootstrapfolderThe above is captured in the first couple of commits, the remaining commits:
- Why I did
The aim of this test suite is to ensure that the day 1 and day 2 configurations that MCC generates are consistent, no matter the inputs. This isn't the case today so the full suite won't pass, but we can aspire to fixing those. It should also be relatively easy to add new test cases in the future if we identify other ways day 1/day 2 can differ.
- How to verify it
You should be able to run
make bootstrap-e2eon any machine and observe that the first test currently passes, while the remaining three fail. The second should be fixed by #2668 (I have tested this already), and the remaining should be fixed by #2547.If you already have the "kubebuilder" binaries installed, you can run
make bootstrap-e2e-localto prevent the test from fetching new binaries.If this is an acceptable test suite (I appreciate the envtest concept is new to this team), then I will set up a CI check for this test explicitly so that you have a clear signal if there's drift/regressions on this bootstrapping in the future.
Am also happy to jump on a call to do a demo/talk through what this does and how it works.
Also happy to go on an owners file for this if that would make the team feel more comfortable.