blog: updated E2E best practices#392
Conversation
|
It would be great to get this published before KubeCon EU 2023 because then readers have the opportunity to ask questions on-site in person. @aojea: The intro, the architecture and the "next steps" are new. The rest is text that you already reviewed earlier for kubernetes/community#7021, just updated a bit to make it flow better in a blog post. |
|
/assign @mrbobbytables For approval. Let's get this published before KubeCon, then folks can chat with me about it there. |
| `ginkgo.DeferCleanup` executes code in the more useful last-in-first-out order, | ||
| i.e. things that get set up first get removed last. | ||
|
|
||
| Objects created in the test namespace do not need to be deleted because |
There was a problem hiding this comment.
maybe you should introduce this at the beginning of the section, that the framework creates a test namespace to avoid test pollution ... I don't know if this behavior of "test namespaces" is well known outside of kubernetes
|
|
||
| Objects created in the test namespace do not need to be deleted because | ||
| deleting the namespace will also delete them. However, if deleting an object | ||
| may fail, then explicitly cleaning it up is better because then failures or |
There was a problem hiding this comment.
also, if you create objects that are going to take some times to be completed delated, I had this mistakes with "terminating pods", I set a grace period of 300 seconds and the pod blocks the garbage collectors for that time
| may fail, then explicitly cleaning it up is better because then failures or | ||
| timeouts related to it will be more obvious. | ||
|
|
||
| In cases where the test may have removed the object, `framework.IgnoreNotFound` |
|
|
||
| ## Polling and timeouts | ||
|
|
||
| When waiting for something to happen, use a reasonable timeout. Without it, a |
There was a problem hiding this comment.
| When waiting for something to happen, use a reasonable timeout. Without it, a | |
| When waiting for something to happen and you need to do asynchronous assertions, use a reasonable timeout. Without it, a |
There was a problem hiding this comment.
o asynchronous checks, I think people is familiar with this term
| When waiting for something to happen, use a reasonable timeout. Without it, a | ||
| test might keep running until the entire test suite gets killed by the | ||
| CI. Beware that the CI under load may take a lot longer to complete some | ||
| operation compared to running the same test locally. On the other hand, a too |
There was a problem hiding this comment.
" On the other hand," here is misleading, I think that you may express the too main problems
- short timeout, the test will flake, per example if the CI is slow
- long timeout, the test may hide underline issues, per example, if there are some races with other components and eventually the condition pass
The thing with timeouts, is that you also should define what is the expected time you consider valid for a an operation to succeed, e2e are not only functiona, i.e. creating a pod and it takes more than 10 minutes to run should not pass because that environment is too busy, or Services can not take more than 1 minute in program the dataplance, ... timeouts are also important to set the upper limits for some behaviors
There was a problem hiding this comment.
the key is to obtain the right balance, a timeout that doesn't flake and that the time is considerable acceptable for that operation
There was a problem hiding this comment.
There are two problems with too long timeouts:
- a feature is broken and some expected state will never occur, but the test needs to run till it times out waiting for that state
- a feature is normally supposed to work within a certain time frame and for some reason is taking too long
The problem is that we don't have good enough control over the performance of the clusters that we test against, nor do many features have any solid "must work within XYZ seconds". Solving that problem goes beyond what we can solve right now.
I'll clarify the first point and add a comment about the second.
| - informative during interactive use (i.e. intermediate reports, either | ||
| periodically or on demand) | ||
| - little to no output during a CI run except when it fails |
There was a problem hiding this comment.
these 2 sentences can be confusing, since is difficult to be informative without giving output 😄
There was a problem hiding this comment.
I'll explain that the amount of information should depend on how the E2E suite was invoked.
| - extension mechanism for writing custom checks | ||
| - early abort when condition cannot be reached anymore | ||
|
|
||
| [`gomega.Eventually`](https://pkg.go.dev/github.com/onsi/gomega#Eventually) |
There was a problem hiding this comment.
Which is part of "all criteria", right? So no need to change anything in the text.
There was a problem hiding this comment.
yeah, I was just a personal comment to show my +1 to this function
| area, so beware that these APIs may | ||
| change at some point. | ||
|
|
||
| - Use `gomega.Consistently` to ensure that some condition is true |
There was a problem hiding this comment.
this is important, unfortunately I'm afraid we are not doing it as much as we should
|
LGTM I left some comments that are not blockers and if necessary can be follow ups /assign @sftim you need docs people IIRC for approval |
|
/hold I'll follow up on some of the suggestions before this gets merged. |
c66c356 to
846fba3
Compare
|
/hold cancel PR updated, ready for another LGTM and approval. |
sftim
left a comment
There was a problem hiding this comment.
/lgtm cancel
The publication date should be in the future at the time we merge it. However, it looks otherwise OK.
| --- | ||
| layout: blog | ||
| title: "E2E Testing Best Practices, Reloaded" | ||
| date: 2023-04-04 |
There was a problem hiding this comment.
For a publication date, how about 2023-04-12?
There was a problem hiding this comment.
Works for me, updated accordingly: https://github.com/kubernetes/contributor-site/compare/846fba3cd23f37924ea22e687ad015d838a4a7fd..7f1a7e9808047ed5c84b577428ccdfab209c4d4e
|
BTW, approval is SIG ContribEx blog team. |
|
/lgtm Thanks |
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
846fba3 to
7f1a7e9
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrbobbytables, pohly, xmcqueen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
"Writing good E2E tests" was already updated a while ago in kubernetes/community#7021. As suggested there (kubernetes/community#7021 (comment)), we should bring this update to the attention of more contributors, hence this blog post.
/cc @aojea @jberkus