forked from RamenDR/ramen
-
Notifications
You must be signed in to change notification settings - Fork 0
Test e2e failure #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
BenamarMk
wants to merge
1,520
commits into
main
Choose a base branch
from
test-e2e-failure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Test e2e failure #28
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update works to ensure VRG is updated with peerClasses that it requires, based on reported PVCs that the VRG is attempting to protect. If a VRG is attempting to protect a PVC for which is is lacking a peerClass and that is available as part of the DRPolicy its peerClasses are updated. For existing peerClasses the VRG information is not updated, this is done to avoid any protection mechanism conflicts. For example, if a VRG carried a peerClass without the replicationID (ie it would choose to protect the PVC using Volsync and VolumeSnapshots), then it is not updated with a peerClass that NOW supports native VolumeReplication, as that would void existing protection. To change replication schemes a workload needs to be DR disabled and then reenabled to catch up to the latest available peer information for an SC. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Secondary VRG should carry the same peerClass information as the Primary VRG, such that on action changes that promotes the Secondary to a Primary the same peerClasses are used to recover and protect the PVCs. This commit addresses updating Secondary VRG with Primary VRG peerClasses for async cases when Volsync is in use. Currently VRG as Secondary is created when there are any PVCs protected by Volsync. Once the issue of mixed workloads is addressed, a Secondary VRG would exist for all cases, VolRep and sync DR setups as well. This will ensure that in all cases the Secondary peerClasses are a copy of the Primary. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Use sorted list, one item per line. This way it is easy to add or remove packages. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Lima 1.0.0 was released[1], and it includes everything we need. Update the docs to install lima from brew instead of source. [1] https://github.com/lima-vm/lima/releases/tag/v1.0.0 Signed-off-by: Nir Soffer <nsoffer@redhat.com>
add storageID, replicationID to StorageClass, replicationClasses and snapshotClasses. this is to faciltate filtering of multiple storageClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>
It was added in failed state (Status=False, Reason=PrerequisiteNotMet) by default which is wrong. The issue was hidden since we check the condition only when deleting the VRG. We want to inspect the condition when the when the VRG is live, so we can report the condition status in the protected pvcs conditions. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Based on the messages addded in csi-addons/kubernetes-csi-addons#691. We want to propagate the error messages to the protected pvcs conditions. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a VR condition is not met, we set the protected PVC condition
message using the error message returned from isVRConditionMet(). When
using csi-addons > 0.10.0, we use now the message from the condition
instead of the default message.
Since the Validated condition is not reported by older version of
csi-addons, and we must wait until the Validated condition status is
known when VRG is deleted, isVRConditionMet() returns now also the state
of the condition, which can be:
- missing: condition not found
- stale: observed generation does not match object generation
- unknown: the special "Unknown" value
- known: status is True or False
When we validate the Validate condition we have these cases:
- Condition is missing: continue to next condition.
- Condition is met: continue to the next condition.
- Condition not met and its status is False. This VR will never
complete and it is safe to delete since replication will never start.
If VRG is deleted, we return true since the VR reached the designed
state. Otherwise we return false. In this case we updated the
protected pvc condition with the message from the VR condition.
- Condition is not met and is stale or unnown: we need to check again
later. There is no point to check the completed condition since a VR
cannot complete without validation.In this case we updated the
protected pvc condition with the message generated by
isVRConditionMet() for stale or unknown conditions.
Example protected pvc DataReady condition with propagated message when
VR validation failed:
conditions:
- lastTransitionTime: "2024-11-06T15:33:06Z"
message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition
desc = system is not in a state required for the operation''s execution:
failed to enable mirroring on image "replicapool/csi-vol-fe2ca7f8-713c-4c51-bf52-0d4b2c11d329":
parent image "replicapool/csi-snap-e2114105-b451-469b-ad97-eb3cbe2af54e"
is not enabled for mirroring'
observedGeneration: 1
reason: Error
status: "False"
type: DataReady
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Verify that when VR Validated condition is False, the condition message is propagated to the protected pvc DataReady condition message. To make this easy to test, we have a new helper for waiting until protected pvc condition status and message are updated to specified values. We propagate the same message to the DataProtected condition, but this behavior seems like unwanted behavior that should change and is not worth testing. Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
improve filtering replicationClass Signed-off-by: rakeshgm <rakeshgm@redhat.com>
add StorageClass Labels, ReplicationClass Lables and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>
add StorgeClass Label, VolumeSnapshotClass Labels and PeerClasses Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
adding tests with no peerClasses and no replicaitonID in VRC Signed-off-by: rakeshgm <rakeshgm@redhat.com>
skip filtering using sid with replicationClass if peerclass is not found Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com> Co-authored-by: Shyamsundar Ranganathan <srangana@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Updated documentation step "drenv setup" as it lead to following error:
$ drenv setup
usage: drenv setup [-h] [-v] [--name-prefix PREFIX] filename
drenv setup: error: the following arguments are required: filename
Signed-off-by: pruthvitd <prd@redhat.com>
…resource Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
…cross relevant components Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Update test/envs/vm.yaml with 2 cpu count to resolve the following error:
$ drenv start envs/vm.yaml
2024-11-11 01:35:05,809 INFO [vm] Starting environment
2024-11-11 01:35:05,943 INFO [cluster] Starting minikube cluster
2024-11-11 01:35:06,330 ERROR Command failed
:
File "/home/ramenuser/ramen/test/drenv/commands.py", line 207, in watch
raise Error(args, error, exitcode=p.returncode)
drenv.commands.Error: Command failed:
command: ('minikube', 'start', '--profile', 'cluster', '--driver', 'kvm2', '--container-runtime', 'containerd', '--disk-size', '20g', '--nodes', '1', '--cni', 'auto', '--cpus', '1', '--memory', '2g', '--extra-config', 'kubelet.serialize-image-pulls=false', '--insecure-registry=host.minikube.internal:5000')
exitcode: 29
error:
X Exiting due to RSRC_INSUFFICIENT_CORES: Requested cpu count 1 is less than the minimum allowed of 2
Signed-off-by: pruthvitd <prd@redhat.com>
1. Add check hook implementation during backup 2. Add check hook implementation during restore 3. Add roles required for reading deployments and statefulsets 4. Add timeout operation for check hook Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
This is not the right place to protect the kube resources as one final sync when we are relocating. It is too late as the user has already been informed to cleanup the resources and some of the resources might have been deleted already. The right time to do it is before the application resource cleanup but we cannot do it right now because of the difference in how volrep and volsync behave. We will fix it in a future release. Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Move/update info logs to debug in EnableProtection DisableProtection for managed and discovered apps Signed-off-by: Parikshith <pbyregow@redhat.com>
Add debug logs for Create and delete ManagedClusterSetBinding, createPlacementManagedByRamen, CreatePlacement and DeletePlacement Signed-off-by: Parikshith <pbyregow@redhat.com>
Add debug logs in create, delete, generate and wait drpc funcs Signed-off-by: Parikshith <pbyregow@redhat.com>
Replace zapcore.AddSync with zapcore.Lock for safe concurrent writes to logfile and console. Signed-off-by: Parikshith <pbyregow@redhat.com>
*The 'enforce-err-cuddling' property under 'linters-settings.wsl' is not allowed in the latest jsonschema for golangci-lint. Replace with 'force-err-cuddling'. *Changing to 'force-err-cuddling' revealed existing lint errors wrt: if statements that check an error must be cuddled with the statement that assigned the error. Fixed all revealed lint issues. Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Oded Viner <oviner@redhat.com>
- Moved functions responsible for retrieving the current cluster from dractions/retry.go to util/placement.go - Moved getPlacement from dractions/crug.go to util/placement.go - Refactored getTargetCluster to get drPolicy and also changed to pass currentCluster as argument. - Made GetCurrentCluster(added doc) and GetPlacement functions public. - Updated references to use the new public functions. Signed-off-by: Parikshith <pbyregow@redhat.com>
In the referenced PRs [1] and [2], two StorageClasses were created for RBD and two for CephFS. Both RBD and CephFS StorageClasses had duplicate storageIDs. This commit resolves the duplicate storageID issue. References: [1] RamenDR#1756 [2] RamenDR#1770 Signed-off-by: rakeshgm <rakeshgm@redhat.com>
-Add debug log after namespace creation -Add debug log after adding namespace annotations and a private constant for the volsync namespace annotation -Update info to debug log in DeleteNamespace Signed-off-by: Parikshith <pbyregow@redhat.com>
With this command, the test would fail without the fix: cmd: # ginkgo -focus="When no running pod is mounting the PVC to be protected" -repeat=10 With this fix the result is consistently succeeding. Here is the result of 100 attempts: Ran 1 of 56 Specs in 12.390 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped PASS All tests passed... This was attempt 100 of 101. Running Suite: Volsync Suite - /root/DR/ut-fix/ramen/internal/controller/volsync ================================================================================ Random Seed: 1739983377 Will run 1 of 56 specs SSSSSSS•SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS Ran 1 of 56 Specs in 11.616 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped PASS Ginkgo ran 1 suite in 19m32.890356929s Test Suite Passed Signed-off-by: pruthvitd <prd@redhat.com>
NetworkFence resources are created with the prefix "network-fence-". This prefix needs to be appended in NF-MCV for the lookup to be successful. Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Update info logs in deploy/undeploy functions for appset, subscription and discoverd apps in deployers package to log clustername of the workload. Signed-off-by: Parikshith <pbyregow@redhat.com>
This change adds a new Makefile target `lint-config-verify`, which runs:"golangci-lint config verify --config=./.golangci.yaml" locally detect issues in `.golangci.yaml`. This allows to detect issues with the config file quickly without running the entire ci. Additionally, `lint` now depends on `lint-config-verify` to ensure the config is always valid before running lint checks. Signed-off-by: Parikshith <pbyregow@redhat.com>
…ntrol Reconciler Signed-off-by: Oded Viner <oviner@redhat.com>
Earlier, a failure in hook execution was wrongly exiting out of the for loop and treating the backup process as success. We return and come back later to continue processing. Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Convert Info logs to Debug logs for functions in the deployers and workload packages.Add additional debug logs at the end of resource creation and deletion and remove logs before operations when we log the operation in helper functions. Update debug logs by adding more information, such as: resource names, namespace etc. Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Parikshith <pbyregow@redhat.com>
- Ignore Ginkgo test report file - Ignore placeholder CRD file Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
This change enables logging cluster names in the next step, improving debugging
and log readability by clearly identifying the cluster in log messages for
different operations.
- Updated Cluster struct to include Name field.
- Added `ClusterConfig` struct for cluster properties: name and kubeconfigpath.
- Added default cluster names constants
- Updated `NewContext` to ensure each cluster gets a default name if the `name`
field is missing or empty in the config.
- Updated comment in config.yaml.sample & added name property for all clusters.
- Updated drenv/ramen.py to include cluster names along with kubeconfigpaths
Generated drenv config:
clusters:
hub:
name: hub
kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/hub
c1:
name: dr1
kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr1
c2:
name: dr2
kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr2
Signed-off-by: Parikshith <pbyregow@redhat.com>
Refactored functions in the utils package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>
Refactored functions in the dractions package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>
Updated `getDRClusterClient` to `getDRCluster` to return `util.Cluster` instead of `client.Client`. Refactored functions in the deployers, types and workloads package to replace the `client client.Client` parameter with `cluster Cluster`. Updated client calls to `cluster.Client`. Also modified caller functions to pass cluster instead of client. Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.