Skip to content

Fix45 probes#19

Merged
psturc merged 321 commits intopsturc:mainfrom
jhutar:fix45-probes
Nov 3, 2025
Merged

Fix45 probes#19
psturc merged 321 commits intopsturc:mainfrom
jhutar:fix45-probes

Conversation

@psturc
Copy link
Owner

@psturc psturc commented Nov 3, 2025

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.

Issue ticket number and link

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added meaningful description with JIRA/GitHub issue key(if applicable), for example HASSuiteDescribe("STONE-123456789 devfile source")
  • I have updated labels (if needed)

jhutar and others added 30 commits October 31, 2025 10:47
… runtime error: invalid memory address or nil pointer dereference
…41-ltqhj in namespace jhutar-tenant failed: {Type:Released Status:False ObservedGeneration:0 LastTransitionTime:2025-06-24 04:20:33 +0000 UTC Reason:Failed Message:Release validation failed}
…plication jhutar-app-wpyjx: failed to get API group resources: unable to retrieve the complete list of server APIs: appstudio.redhat.com/v1alpha1: Get https://api.stone-prod-p02.hjvn.p1.openshiftapps.com:6443/apis/appstudio.redhat.com/v1alpha1: net/http: TLS handshake timeout
…6e-dgznx in namespace jhutar-tenant failed: {Type:Released Status:False ObservedGeneration:0 LastTransitionTime:2025-07-02 06:44:05 +0000 UTC Reason:Progressing Message:}
…try.access.redhat.com returned 403

Error with some context:

    [2025-07-01T19:20:02,003805722+00:00] buildah build --volume /tmp/entitlement:/etc/pki/entitlement --security-opt=unmask=/proc/interrupts --label build-date=2025-07-01T19:20:01 --label architecture=x86_64 --label vcs-type=git --label vcs-ref=7ec92bec984d038dfc2d1fb6241044431f113cc8 --label quay.expires-after=5d --tls-verify=true --no-cache --ulimit nofile=4096:4096 -f /tmp/Dockerfile.iS78Jt -t quay.io/redhat-user-workloads/jhutar-1-tenant/jhutar-1-app-ghefc-comp-0:on-pr-7ec92bec984d038dfc2d1fb6241044431f113cc8 .
    [1/2] STEP 1/3: FROM registry.access.redhat.com/ubi8/nodejs-18:latest
    Trying to pull registry.access.redhat.com/ubi8/nodejs-18:latest...
    Error: creating build container: internal error: unable to copy from source docker://registry.access.redhat.com/ubi8/nodejs-18:latest: copying system image from manifest list: determining manifest MIME type for docker://registry.access.redhat.com/ubi8/nodejs-18:latest: reading manifest sha256:3a895f2b85ffeda82b2d50ce1ae554bc5bc62448aba48b3fd56ce94b694b3b2a in registry.access.redhat.com/ubi8/nodejs-18: StatusCode: 403, "<html>\r\n<head><title>403 Forbidden</title></head>\r..."

Link:

    https://workdir-exporter-jenkins-csb-perf.apps.int.gpc.ocp-hub.prod.psi.redhat.com/workspace/StoneSoupLoadTestProbe_stone_prd_rh01/e2e-tests/tests/load-tests/OLD/run-stone-prd-rh01-2025_07_01T19_17_10_668176918_00_00/collected-data/jhutar-1-tenant/1/pod-jhutar-1-app-ghefc-comp-0-oe35b8003686447efeb763c670815dd4c-pod-step-build.log
…onfig.json

Error:

    Artifact type will be determined by introspection.
    Checking the media type of the OCI artifact...
    The media type of the OCI artifact is application/vnd.oci.image.manifest.v1+json.
    Looking for image labels that indicate this might be an operator bundle...
    time="2025-07-02T00:58:14Z" level=fatal msg="Error parsing image name \"docker://quay.io/redhat-user-workloads/jhutar-tenant/jhutar-app-youop-comp-0:on-pr-267c18212324a4f9f4b2fd8f30225e4b6ba1e4c0\": getting username and password: reading JSON file \"/tekton/home/.docker/config.json\": unmarshaling JSON at \"/tekton/home/.docker/config.json\": unexpected end of JSON input"

Link:

    https://workdir-exporter-jenkins-csb-perf.apps.int.gpc.ocp-hub.prod.psi.redhat.com/workspace/StoneSoupLoadTestProbe_stone_prod_p02/e2e-tests/tests/load-tests/OLD/run-stone-prod-p02-2025_07_02T00_55_12_206383611_00_00/collected-data/jhutar-tenant/1/pod-jhutar-app-youop-comp-0-on-d9711596435debc219a10437940162fb-pod-step-introspect.log
Error:

    [2025-07-03T11:43:36,850878165+00:00] Update CA trust
    INFO: Using mounted CA bundle: /mnt/trusted-ca/ca-bundle.crt
    '/mnt/trusted-ca/ca-bundle.crt' -> '/etc/pki/ca-trust/source/anchors/ca-bundle.crt'
    [2025-07-03T11:43:37,651927712+00:00] Prepare Dockerfile
    Cannot find Dockerfile Dockerfile

Link:

    https://workdir-exporter-jenkins-csb-perf.apps.int.gpc.ocp-hub.prod.psi.redhat.com/workspace/StoneSoupLoadTestProbe_kflux_rhel_p01_RPM/e2e-tests/tests/load-tests/OLD/run-kflux-rhel-p01-2025_07_03T11_39_39_192503922_00_00/collected-data/jhutar-tenant/1/pod-jhutar-app-svayh-comp-0-on-push-2wvnb-build-container-pod-step-build.log

This is a problem on a load test side I guess. Looks like we fail to
switch pipeline run to RPM build one and container build pipeline is
trying to build the repo same way as it would be a container.
…ddresses in subnet

Error was in collected-data/jhutar-tenant/1/pod-jhutar-app-wvdca-comp-0-on-push-kwln8-calculate-deps-x86-64-pod-step-mock-build.log:

+ mkdir -p /root/.ssh
+ '[' -e /ssh/error ']'
+ cat /ssh/error
Error allocating host: failed to launch EC2 instance for jhutar-app-wvdca-comp-0-on-push-kwln8-calculate-deps-x86-64: operation error EC2: RunInstances, https response error StatusCode: 400, RequestID: 19286adc-c47b-411d-b896-2d77368509dd, api error InsufficientFreeAddressesInSubnet: There are not enough free addresses in subnet 'subnet-0aa719a6c5b602b16' to satisfy the requested number of instances.

Context info:
  Platform: linux/amd64
  File:     /opt/app-root/src/pkg/reconciler/taskrun/taskrun.go
  Line:     458

+ exit 1
jhutar added 28 commits October 31, 2025 10:47
Normally we are using this template repo:

    https://github.com/rhtap-perf-test/konflux-probe-test-templates

That lives in same organization as all the component repos we work with.

When Zhiming tried to use his fork with one small change:

    https://github.com/zxiong/konflux-probe-test-templates

getting file from there did not seen the change.

Turns out our code was actually getting the file from `rhtap-perf-test`.

This fixes the issue and as far as the GitHub token is sufficient to get
thta file from that different repo in different organization, it will
try to do so.
Incident caused by the load test was discussed here: https://redhat-internal.slack.com/archives/C04F4NE15U1/p1759163666162299

We need to serialize component creation ane donboarding to make the test
relevant even for higher scale because build controller reconciles new
components in sequence and needs ~1 minute per component.

If we would not serialize that, we would quickly start hitting timeouts.
Adds a new function DoHarmlessCommit to pkg/journey/handle_repo_templating.go.
This function creates or updates a file named 'just-trigger-build' with the
current date and time and commits it. It is designed to work with both
GitHub and GitLab repositories.

Generated-by: Gemini
This is needed because component onboarding is very slow and if we want
to avoid it and put significant load on the cluster, we need to create
(and onboard) the components in advance.

Still need to update collection code to make it work.
When running user journeys with `JourneyReuseComponents` or `JourneyReuseApplications` enabled, a race condition would occur. The setup functions (`PerComponentSetup` and `PerApplicationSetup`) would attempt to copy the name from the first component or application to subsequent ones before the first one had been created, resulting in an empty name.

This commit resolves the issue by moving the name-copying logic into the respective `HandleComponent` and `HandleApplication` functions. This ensures that subsequent components or applications wait for the first one to be fully initialized before attempting to reuse its name, using `utils.WaitUntilWithInterval` to poll for the name to become available.

Generated-by: Gemini
When a journey is repeated with component reuse enabled, the HandleComponent function should check for existing components from the *first* journey iteration.

The previous logic incorrectly looked for the component's context within the current iteration, leading to a timeout because the component name was always empty.

This change adjusts the lookup to reference the PerApplicationContext from the initial journey, ensuring the correct component is found and the wait loop does not time out.

Generated-by: Gemini
…r these to encure these passes are considered complete as well
…ib.go

Corrected the field names used to initialize the utils.Options struct in the NewFramework function to match the actual struct definition. This resolves a compile-time error.

Generated-by: Gemini
@psturc psturc merged commit 65478b6 into psturc:main Nov 3, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants