Test e2e failure #28

BenamarMk · 2025-02-28T07:00:06Z

No description provided.

Update works to ensure VRG is updated with peerClasses that it requires, based on reported PVCs that the VRG is attempting to protect. If a VRG is attempting to protect a PVC for which is is lacking a peerClass and that is available as part of the DRPolicy its peerClasses are updated. For existing peerClasses the VRG information is not updated, this is done to avoid any protection mechanism conflicts. For example, if a VRG carried a peerClass without the replicationID (ie it would choose to protect the PVC using Volsync and VolumeSnapshots), then it is not updated with a peerClass that NOW supports native VolumeReplication, as that would void existing protection. To change replication schemes a workload needs to be DR disabled and then reenabled to catch up to the latest available peer information for an SC. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

Secondary VRG should carry the same peerClass information as the Primary VRG, such that on action changes that promotes the Secondary to a Primary the same peerClasses are used to recover and protect the PVCs. This commit addresses updating Secondary VRG with Primary VRG peerClasses for async cases when Volsync is in use. Currently VRG as Secondary is created when there are any PVCs protected by Volsync. Once the issue of mixed workloads is addressed, a Secondary VRG would exist for all cases, VolRep and sync DR setups as well. This will ensure that in all cases the Secondary peerClasses are a copy of the Primary. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>

Use sorted list, one item per line. This way it is easy to add or remove packages. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Lima 1.0.0 was released[1], and it includes everything we need. Update the docs to install lima from brew instead of source. [1] https://github.com/lima-vm/lima/releases/tag/v1.0.0 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

add storageID, replicationID to StorageClass, replicationClasses and snapshotClasses. this is to faciltate filtering of multiple storageClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

It was added in failed state (Status=False, Reason=PrerequisiteNotMet) by default which is wrong. The issue was hidden since we check the condition only when deleting the VRG. We want to inspect the condition when the when the VRG is live, so we can report the condition status in the protected pvcs conditions. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Based on the messages addded in csi-addons/kubernetes-csi-addons#691. We want to propagate the error messages to the protected pvcs conditions. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When a VR condition is not met, we set the protected PVC condition message using the error message returned from isVRConditionMet(). When using csi-addons > 0.10.0, we use now the message from the condition instead of the default message. Since the Validated condition is not reported by older version of csi-addons, and we must wait until the Validated condition status is known when VRG is deleted, isVRConditionMet() returns now also the state of the condition, which can be: - missing: condition not found - stale: observed generation does not match object generation - unknown: the special "Unknown" value - known: status is True or False When we validate the Validate condition we have these cases: - Condition is missing: continue to next condition. - Condition is met: continue to the next condition. - Condition not met and its status is False. This VR will never complete and it is safe to delete since replication will never start. If VRG is deleted, we return true since the VR reached the designed state. Otherwise we return false. In this case we updated the protected pvc condition with the message from the VR condition. - Condition is not met and is stale or unnown: we need to check again later. There is no point to check the completed condition since a VR cannot complete without validation.In this case we updated the protected pvc condition with the message generated by isVRConditionMet() for stale or unknown conditions. Example protected pvc DataReady condition with propagated message when VR validation failed: conditions: - lastTransitionTime: "2024-11-06T15:33:06Z" message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition desc = system is not in a state required for the operation''s execution: failed to enable mirroring on image "replicapool/csi-vol-fe2ca7f8-713c-4c51-bf52-0d4b2c11d329": parent image "replicapool/csi-snap-e2114105-b451-469b-ad97-eb3cbe2af54e" is not enabled for mirroring' observedGeneration: 1 reason: Error status: "False" type: DataReady Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Verify that when VR Validated condition is False, the condition message is propagated to the protected pvc DataReady condition message. To make this easy to test, we have a new helper for waiting until protected pvc condition status and message are updated to specified values. We propagate the same message to the DataProtected condition, but this behavior seems like unwanted behavior that should change and is not worth testing. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

improve filtering replicationClass Signed-off-by: rakeshgm <rakeshgm@redhat.com>

add StorageClass Labels, ReplicationClass Lables and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

add StorgeClass Label, VolumeSnapshotClass Labels and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

adding tests with no peerClasses and no replicaitonID in VRC Signed-off-by: rakeshgm <rakeshgm@redhat.com>

skip filtering using sid with replicationClass if peerclass is not found Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Signed-off-by: rakeshgm <rakeshgm@redhat.com> Co-authored-by: Shyamsundar Ranganathan <srangana@redhat.com>

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Updated documentation step "drenv setup" as it lead to following error: $ drenv setup usage: drenv setup [-h] [-v] [--name-prefix PREFIX] filename drenv setup: error: the following arguments are required: filename Signed-off-by: pruthvitd <prd@redhat.com>

…resource Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

…cross relevant components Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Update test/envs/vm.yaml with 2 cpu count to resolve the following error: $ drenv start envs/vm.yaml 2024-11-11 01:35:05,809 INFO [vm] Starting environment 2024-11-11 01:35:05,943 INFO [cluster] Starting minikube cluster 2024-11-11 01:35:06,330 ERROR Command failed : File "/home/ramenuser/ramen/test/drenv/commands.py", line 207, in watch raise Error(args, error, exitcode=p.returncode) drenv.commands.Error: Command failed: command: ('minikube', 'start', '--profile', 'cluster', '--driver', 'kvm2', '--container-runtime', 'containerd', '--disk-size', '20g', '--nodes', '1', '--cni', 'auto', '--cpus', '1', '--memory', '2g', '--extra-config', 'kubelet.serialize-image-pulls=false', '--insecure-registry=host.minikube.internal:5000') exitcode: 29 error: X Exiting due to RSRC_INSUFFICIENT_CORES: Requested cpu count 1 is less than the minimum allowed of 2 Signed-off-by: pruthvitd <prd@redhat.com>

1. Add check hook implementation during backup 2. Add check hook implementation during restore 3. Add roles required for reading deployments and statefulsets 4. Add timeout operation for check hook Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>

This is not the right place to protect the kube resources as one final sync when we are relocating. It is too late as the user has already been informed to cleanup the resources and some of the resources might have been deleted already. The right time to do it is before the application resource cleanup but we cannot do it right now because of the difference in how volrep and volsync behave. We will fix it in a future release. Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Move/update info logs to debug in EnableProtection DisableProtection for managed and discovered apps Signed-off-by: Parikshith <pbyregow@redhat.com>

Add debug logs for Create and delete ManagedClusterSetBinding, createPlacementManagedByRamen, CreatePlacement and DeletePlacement Signed-off-by: Parikshith <pbyregow@redhat.com>

Add debug logs in create, delete, generate and wait drpc funcs Signed-off-by: Parikshith <pbyregow@redhat.com>

Replace zapcore.AddSync with zapcore.Lock for safe concurrent writes to logfile and console. Signed-off-by: Parikshith <pbyregow@redhat.com>

*The 'enforce-err-cuddling' property under 'linters-settings.wsl' is not allowed in the latest jsonschema for golangci-lint. Replace with 'force-err-cuddling'. *Changing to 'force-err-cuddling' revealed existing lint errors wrt: if statements that check an error must be cuddled with the statement that assigned the error. Fixed all revealed lint issues. Signed-off-by: Parikshith <pbyregow@redhat.com>

Signed-off-by: Oded Viner <oviner@redhat.com>

- Moved functions responsible for retrieving the current cluster from dractions/retry.go to util/placement.go - Moved getPlacement from dractions/crug.go to util/placement.go - Refactored getTargetCluster to get drPolicy and also changed to pass currentCluster as argument. - Made GetCurrentCluster(added doc) and GetPlacement functions public. - Updated references to use the new public functions. Signed-off-by: Parikshith <pbyregow@redhat.com>

In the referenced PRs [1] and [2], two StorageClasses were created for RBD and two for CephFS. Both RBD and CephFS StorageClasses had duplicate storageIDs. This commit resolves the duplicate storageID issue. References: [1] RamenDR#1756 [2] RamenDR#1770 Signed-off-by: rakeshgm <rakeshgm@redhat.com>

-Add debug log after namespace creation -Add debug log after adding namespace annotations and a private constant for the volsync namespace annotation -Update info to debug log in DeleteNamespace Signed-off-by: Parikshith <pbyregow@redhat.com>

With this command, the test would fail without the fix: cmd: # ginkgo -focus="When no running pod is mounting the PVC to be protected" -repeat=10 With this fix the result is consistently succeeding. Here is the result of 100 attempts: Ran 1 of 56 Specs in 12.390 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped PASS All tests passed... This was attempt 100 of 101. Running Suite: Volsync Suite - /root/DR/ut-fix/ramen/internal/controller/volsync ================================================================================ Random Seed: 1739983377 Will run 1 of 56 specs SSSSSSS•SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS Ran 1 of 56 Specs in 11.616 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped PASS Ginkgo ran 1 suite in 19m32.890356929s Test Suite Passed Signed-off-by: pruthvitd <prd@redhat.com>

NetworkFence resources are created with the prefix "network-fence-". This prefix needs to be appended in NF-MCV for the lookup to be successful. Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Update info logs in deploy/undeploy functions for appset, subscription and discoverd apps in deployers package to log clustername of the workload. Signed-off-by: Parikshith <pbyregow@redhat.com>

This change adds a new Makefile target `lint-config-verify`, which runs:"golangci-lint config verify --config=./.golangci.yaml" locally detect issues in `.golangci.yaml`. This allows to detect issues with the config file quickly without running the entire ci. Additionally, `lint` now depends on `lint-config-verify` to ensure the config is always valid before running lint checks. Signed-off-by: Parikshith <pbyregow@redhat.com>

…ntrol Reconciler Signed-off-by: Oded Viner <oviner@redhat.com>

Earlier, a failure in hook execution was wrongly exiting out of the for loop and treating the backup process as success. We return and come back later to continue processing. Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

Convert Info logs to Debug logs for functions in the deployers and workload packages.Add additional debug logs at the end of resource creation and deletion and remove logs before operations when we log the operation in helper functions. Update debug logs by adding more information, such as: resource names, namespace etc. Signed-off-by: Parikshith <pbyregow@redhat.com>

Signed-off-by: Parikshith <pbyregow@redhat.com>

- Ignore Ginkgo test report file - Ignore placeholder CRD file Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>

This change enables logging cluster names in the next step, improving debugging and log readability by clearly identifying the cluster in log messages for different operations. - Updated Cluster struct to include Name field. - Added `ClusterConfig` struct for cluster properties: name and kubeconfigpath. - Added default cluster names constants - Updated `NewContext` to ensure each cluster gets a default name if the `name` field is missing or empty in the config. - Updated comment in config.yaml.sample & added name property for all clusters. - Updated drenv/ramen.py to include cluster names along with kubeconfigpaths Generated drenv config: clusters: hub: name: hub kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/hub c1: name: dr1 kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr1 c2: name: dr2 kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr2 Signed-off-by: Parikshith <pbyregow@redhat.com>

Refactored functions in the utils package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>

Refactored functions in the dractions package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>

Updated `getDRClusterClient` to `getDRCluster` to return `util.Cluster` instead of `client.Client`. Refactored functions in the deployers, types and workloads package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

ShyamsundarR and others added 30 commits November 6, 2024 10:54

Make brew packages easier to maintain

8dee469

Use sorted list, one item per line. This way it is easy to add or remove packages. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Install lima 1.0.0 from brew

78d25e6

Lima 1.0.0 was released[1], and it includes everything we need. Update the docs to install lima from brew instead of source. [1] https://github.com/lima-vm/lima/releases/tag/v1.0.0 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

adding storageID and replicationID to classes

10089bc

add storageID, replicationID to StorageClass, replicationClasses and snapshotClasses. this is to faciltate filtering of multiple storageClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Add .Message to VR conditions

f94c07c

Based on the messages addded in csi-addons/kubernetes-csi-addons#691. We want to propagate the error messages to the protected pvcs conditions. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

disabling metrics as we dont need it in tests

9a70f86

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

filter Peerclasses

0ca4c03

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

vrg_volRep fix

b6bd522

improve filtering replicationClass Signed-off-by: rakeshgm <rakeshgm@redhat.com>

fix vrg_recipe_test.go

5b2f7fc

add StorageClass Labels, ReplicationClass Lables and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

fix: vrg_volsync_test.go

d1c68c3

add StorgeClass Label, VolumeSnapshotClass Labels and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>

handle case where PeerClasses are not available

60df21c

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

vrg_volrep_test: tests with no peerClasses

15a4759

adding tests with no peerClasses and no replicaitonID in VRC Signed-off-by: rakeshgm <rakeshgm@redhat.com>

skip filtering using sid with replicationClass

f5c58de

skip filtering using sid with replicationClass if peerclass is not found Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Add new VRG DataReady Conditions

35d73b9

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

addressing review comments

a74597c

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

refactor to make selecting VRC/VSC easier

2aad1f4

Signed-off-by: rakeshgm <rakeshgm@redhat.com> Co-authored-by: Shyamsundar Ranganathan <srangana@redhat.com>

filter snapshotClasses based on sid for volsync

430b299

Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Add RBD support (Block/Filesystem VolumeMode) via annotation in DRPC …

378d5ff

…resource Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Implement the necessary changes for non-CG Block volumeMode support a…

69fce22

…cross relevant components Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Rename rmnutil to util

7a22e27

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Rename annotation and other misc changes

1f11958

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Correcting lint issues

86b7748

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>

Correcting lint issues

cdecf76

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>

raghavendra-talur and others added 30 commits February 13, 2025 06:41

vrg: improve log message

c047bb3

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

log for progression

6dafc42

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

vrg: differentiate vrg and kube objects upload msg

074d4f3

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

vrg: better logging for conditions

ca44e13

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

e2e: updated logs for enable/disable actions

ad1b697

Move/update info logs to debug in EnableProtection DisableProtection for managed and discovered apps Signed-off-by: Parikshith <pbyregow@redhat.com>

e2e: logs for mcsb and placement functions

cfb0395

Add debug logs for Create and delete ManagedClusterSetBinding, createPlacementManagedByRamen, CreatePlacement and DeletePlacement Signed-off-by: Parikshith <pbyregow@redhat.com>

e2e: Update logs for drpc related funcs

cb8f2ca

Add debug logs in create, delete, generate and wait drpc funcs Signed-off-by: Parikshith <pbyregow@redhat.com>

e2e: use zapcore.Lock

e2d4809

Replace zapcore.AddSync with zapcore.Lock for safe concurrent writes to logfile and console. Signed-off-by: Parikshith <pbyregow@redhat.com>

Rename ramenctl to ramendev for future debugging tool

1d9c318

Signed-off-by: Oded Viner <oviner@redhat.com>

e2e: add debug logs for namespace functions

6529e69

-Add debug log after namespace creation -Add debug log after adding namespace annotations and a private constant for the volsync namespace annotation -Update info to debug log in DeleteNamespace Signed-off-by: Parikshith <pbyregow@redhat.com>

Fix NetworkFence resource lookup issue

0ba920e

NetworkFence resources are created with the prefix "network-fence-". This prefix needs to be appended in NF-MCV for the lookup to be successful. Signed-off-by: rakeshgm <rakeshgm@redhat.com>

e2e: update info logs for deploy/undeploy

644efd2

Update info logs in deploy/undeploy functions for appset, subscription and discoverd apps in deployers package to log clustername of the workload. Signed-off-by: Parikshith <pbyregow@redhat.com>

Fix Negative Requeue Duration in getStatusCheckDelay of DRPlacementCo…

a74b631

…ntrol Reconciler Signed-off-by: Oded Viner <oviner@redhat.com>

vrg: return on hook errors during backup

970298e

Earlier, a failure in hook execution was wrongly exiting out of the for loop and treating the backup process as success. We return and come back later to continue processing. Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>

e2e: unify captialization in some logs in deployers/crud.go

791223c

Signed-off-by: Parikshith <pbyregow@redhat.com>

update .gitignore for test and CRD files

50eb9f9

- Ignore Ginkgo test report file - Ignore placeholder CRD file Signed-off-by: rakeshgm <rakeshgm@redhat.com>

Shorten Volsync Job Name if exceeds 63 characters

0d7dce1

Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>

Prune old snapshots during reconcile replication destination

19dd1b0

Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>

Testing e2e by disabling parallel execution

f408379

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test e2e failure #28

Test e2e failure #28

Uh oh!

BenamarMk commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Test e2e failure #28

Are you sure you want to change the base?

Test e2e failure #28

Uh oh!

Conversation

BenamarMk commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants