Fix #21: Prevent pytest from re-running full suite on coverage failure#175
Fix #21: Prevent pytest from re-running full suite on coverage failure#175TheDevCodingKen wants to merge 1 commit intoansible:develfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA single command-line argument Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.Add a Ruff configuration file to your project to customize how CodeRabbit runs |
c757698 to
509615a
Compare
|
I'm not sure we were in agreement regarding what the behavior should be in the end. |
|
I self-assigned that to myself because I wanted to do some more thinking. I'm not entirely convinced if we should be addressing this at all — this patch makes the step run and result in "hey, I ran but not really, and also here's your failure — guess why". I'm worried it'd make the UX even more confusing. This is mainly about the UX of the CI so local testing won't demonstrate anything interesting to us. If you want to see how this changes the experience for real — you can simulate the situation within your fork (make an additional PR with the simulation against |
Thanks for looking into this, @webknjaz. That makes complete sense. Optimizing CI resource usage is great, but I completely agree that if it comes at the cost of DevEx and confusing CI logs, it defeats the purpose. I really appreciate the context on the UX side of things. I'll run a simulation in my personal fork as you suggested to see exactly what the GitHub Actions output looks like with this patch applied. Let me know if you decide to go a different route with the communication output, happy to pivot or close this out based on what's best for the project's DevEx. |
|
Dunno. Perhaps, FWIW, #21 seems mainly about the DX, not the resources, though. cc @chrismeyersfsu |
That makes total sense regarding the DX focus. Shifting the fail-fast behavior to the initial run using --exitfirst is an interesting idea—it would definitely save the developer from waiting through a long initial run if a test breaks early. I’ll hold off on running the simulation or pushing any alternative commits for now so you and @chrismeyersfsu can sync up on the best architectural approach for the pipeline's DevEx. Happy to implement whatever workflow you both decide is best. |
But this also means that such runs may result in reporting heavily reduced coverage metrics. So I've no definitive opinion on this thus far. |
Ah, that is a great point, @webknjaz. Aborting the suite early would definitely skew the coverage report since the remaining tests wouldn't execute to hit those lines of code. It's a tricky balance between optimizing the DevEx and maintaining accurate coverage metrics! I'll leave the PR open and as-is for now while you and the team weigh the trade-offs. Absolutely no rush on my end! |
|
May I ask you what motivated you to pick up that issue? |
To be completely transparent, there were two main drivers, @webknjaz. First, I'm a dev at IBM actively seeking my next role and I wanted to keep my DevOps and infrastructure skills sharp. Second, I had a technical interview round with the Ansible Hub team on Wed, 3/18. While I ultimately didn't land the role this time around, I wanted to get my hands dirty in a mature Ansible codebase (a) to see how the broader organization handles testing and pipeline architecture at scale, and (b) rather than just reading documentation to prepare. So, I was looking for an opportunity to learn, and working through these DevEx and coverage trade-offs with you has been incredibly insightful. I'm still happy to help figure out the best architectural approach for this pipeline. |
Fixes #21
Summary
When
pytest-covtriggers a failure due to the coverage threshold not being met, thepytest --lfcommand in thetox-rerun-posargslist evaluates an empty failure cache. By default,pytestresponds to an empty--lfcache by re-running the entire test suite, which leads to inefficient use of CI resources.This PR adds the
--last-failed-no-failures noneflag to theci-cd.ymlworkflow. This overrides the default behavior, instructingpytestto fail fast and skip test execution if there are no actual functional test failures in the cache.Local Verification
Reproduced the issue locally by temporarily raising the
fail_underthreshold in.coveragercto force a coverage failure, then executing the rerun logic.Before (Simulating coverage drop):
After (With flag applied):
Result:
Pytest correctly identified no test failures and deselected the suite.
Summary by CodeRabbit