NETOBSERV-2443 fix bug, improve cleanup and writing files #404

memodi · 2025-10-14T21:50:28Z

Description

NETOBSERV-2443 fix bug, improve cleanup and writing files

With the help of Claude, I was able to identify the flakiness coming from pty and made bunch of improvements as below:

Complete output capture - All lines captured, no race conditions
Proper timeout handling - API calls respect polling context timeouts
Reliable cleanup - Ignores SIGHUP, completes deletion
Absolute paths for file reads
Cleanup of output/flow directory after every test, so next test won't read from the same file.
Use the OCP-XXXX and test label combination for the output files of collector and cleanup cmd

Made several runs, now CLI tests are much stable.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

codecov · 2025-10-14T21:54:37Z

Codecov Report

❌ Patch coverage is 0% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 13.54%. Comparing base (1654142) to head (ceb7f6a).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
e2e/common.go	0.00%	64 Missing ⚠️
e2e/integration-tests/cli.go	0.00%	26 Missing ⚠️
e2e/integration-tests/cluster.go	0.00%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
- Coverage   13.84%   13.54%   -0.30%     
==========================================
  Files          18       18              
  Lines        2731     2326     -405     
==========================================
- Hits          378      315      -63     
+ Misses       2329     1987     -342     
  Partials       24       24

Flag	Coverage Δ
unittests	`13.54% <0.00%> (-0.30%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
e2e/integration-tests/cluster.go	`0.00% <0.00%> (ø)`
e2e/integration-tests/cli.go	`0.00% <0.00%> (ø)`
e2e/common.go	`0.00% <0.00%> (ø)`

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

memodi · 2025-10-14T22:54:39Z

/test ?

openshift-ci · 2025-10-14T22:54:42Z

@memodi: The following commands are available to trigger required jobs:

/test images

/test integration-tests

Use /test all to run all jobs.

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

memodi · 2025-10-14T22:55:46Z

/test integration-tests

memodi · 2025-10-17T13:02:41Z

/test integration-tests

memodi · 2025-10-17T15:19:34Z

integration tests are failing because for some reason CI cluster is taking too long to pull images.

/test integration-tests

memodi · 2025-10-21T16:04:00Z

/test integration-tests

memodi · 2025-10-24T15:42:55Z

/test integration-tests

- Increase waitDaemonset timeout from 50s to 5 minutes (30×10s) * CI environments often have slow image pulls * Previous timeout was too aggressive for registry operations - Add comprehensive diagnostic output on pod startup failure: * Pod status with node placement (get pods -o wide) * Recent events to identify ImagePullBackOff, etc * Pod event details from describe output * Daemonset logs if containers started This helps diagnose ContainerCreating issues in CI where pods fail to start due to image pull problems or resource constraints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

CI runs showed 4/6 pods ready with 5 minute timeout, indicating image pulls need more time. Increasing to 10 minutes (60×10s) to accommodate slower CI registry pulls and pod scheduling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

In E2E test mode, the bash script's waitDaemonset() could exit with error after 10 minutes while the Go test's isDaemonsetReady() was still polling. This created a race where: 1. Go test calls StartCommand() which runs bash script async 2. Bash script calls waitDaemonset() and waits 10 mins 3. Go test calls isDaemonsetReady() and waits 10 mins 4. If bash times out first, it calls exit 1, killing the process 5. Go test is left polling a dead command Solution: When isE2E=true, skip the bash-level wait since the Go test framework handles pod readiness checking via isDaemonsetReady(). For manual CLI usage (isE2E=false), the wait still runs as before to provide user feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Tests were failing because: 1. Commands ran with --max-time=1m in foreground mode 2. After 1 minute, capture finished and auto-cleanup ran 3. Cleanup deleted the daemonset 4. isDaemonsetReady() was polling for a deleted daemonset 5. Test failed with context deadline exceeded Using --background mode prevents automatic cleanup when the capture finishes, allowing the test to verify daemonset privilege settings before cleanup runs. Also, Check for CLI is running instead of just daemnset. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

memodi · 2025-11-03T16:40:42Z

/test integration-tests

memodi · 2025-11-04T23:09:03Z

/test integration-tests

memodi · 2025-11-05T11:52:36Z

/test integration-tests

memodi · 2025-11-05T17:47:44Z

/needs-review

memodi · 2025-11-05T17:48:00Z

@jpinsonneau - any idea why e2e tests are failing?

jpinsonneau · 2025-11-06T10:29:53Z

@jpinsonneau - any idea why e2e tests are failing?

If we expect command terminated in the output, we should rely on RunCommandAndTerminate function.

~~I'm having issues with my local kind so I can't test that right now. Trying to fix that ASAP~~

working locally: ceb7f6a

Amoghrd · 2025-11-07T17:44:50Z

LGTM
Will wait for @oliver-smakal @kapjain-rh to review as well

kapjain-rh · 2025-11-07T21:28:14Z

/lgtm

openshift-ci · 2025-11-07T21:28:17Z

@kapjain-rh: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

oliver-smakal · 2025-11-10T09:39:28Z

/lgtm

memodi · 2025-11-10T15:17:01Z

@jpinsonneau - is this okay to merge? Not sure if you had chance to review.

jpinsonneau

That looks good to me ! Thanks @memodi !

openshift-ci · 2025-11-14T09:51:20Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fix bug, improve cleanup and writing files

d8a6355

memodi added the no-qe label Oct 14, 2025

add a trap to catch SIGHUP and improvements for pty

4e6314d

memodi requested a review from jpinsonneau October 16, 2025 21:00

add filePrefix

b460640

memodi requested review from Amoghrd and oliver-smakal October 16, 2025 21:08

memodi added 2 commits October 16, 2025 17:13

linter

5729e05

fix the artifact dir

86d43d0

memodi and others added 4 commits October 24, 2025 11:45

memodi added 3 commits November 3, 2025 18:09

update daemonset check

b50480b

use main images

7adfcd0

increase the StartCommandWait and logging

e1fb647

remove envs

b3ff3da

add comment and remove extra logging

fd79231

memodi added the needs-review Tells that the PR needs a review label Nov 5, 2025

run and terminate in capture test

ceb7f6a

memodi mentioned this pull request Nov 6, 2025

NETOBSERV-2299: Automation CLI packet capture case #411

Merged

10 tasks

openshift-ci bot assigned oliver-smakal Nov 10, 2025

openshift-ci bot added the lgtm label Nov 10, 2025

memodi removed the needs-review Tells that the PR needs a review label Nov 10, 2025

jpinsonneau approved these changes Nov 14, 2025

View reviewed changes

openshift-ci bot assigned jpinsonneau Nov 14, 2025

jpinsonneau added the approved label Nov 14, 2025

openshift-merge-bot bot merged commit bd46a3e into netobserv:main Nov 14, 2025
12 checks passed

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

Uh oh!

Conversation

memodi commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

codecov bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

openshift-ci bot commented Oct 14, 2025

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 21, 2025

Uh oh!

memodi commented Oct 24, 2025

Uh oh!

memodi commented Nov 3, 2025

Uh oh!

memodi commented Nov 4, 2025

Uh oh!

memodi commented Nov 5, 2025

Uh oh!

memodi commented Nov 5, 2025

Uh oh!

memodi commented Nov 5, 2025

Uh oh!

jpinsonneau commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amoghrd commented Nov 7, 2025

Uh oh!

kapjain-rh commented Nov 7, 2025

Uh oh!

openshift-ci bot commented Nov 7, 2025

Uh oh!

oliver-smakal commented Nov 10, 2025

Uh oh!

memodi commented Nov 10, 2025

Uh oh!

jpinsonneau left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

memodi commented Oct 14, 2025 •

edited

Loading

codecov bot commented Oct 14, 2025 •

edited

Loading

jpinsonneau commented Nov 6, 2025 •

edited

Loading