Skip to content

Conversation

@BenamarMk
Copy link
Owner

No description provided.

ShyamsundarR and others added 30 commits November 6, 2024 10:54
Update works to ensure VRG is updated with peerClasses that it requires,
based on reported PVCs that the VRG is attempting to protect. If a VRG
is attempting to protect a PVC for which is is lacking a peerClass and
that is available as part of the DRPolicy its peerClasses are updated.

For existing peerClasses the VRG information is not updated, this is done
to avoid any protection mechanism conflicts. For example, if a VRG
carried a peerClass without the replicationID (ie it would choose to
protect the PVC using Volsync and VolumeSnapshots), then it is not
updated with a peerClass that NOW supports native VolumeReplication, as
that would void existing protection.

To change replication schemes a workload needs to be DR disabled and then
reenabled to catch up to the latest available peer information for an SC.

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Secondary VRG should carry the same peerClass information as the Primary
VRG, such that on action changes that promotes the Secondary to a Primary
the same peerClasses are used to recover and protect the PVCs.

This commit addresses updating Secondary VRG with Primary VRG peerClasses
for async cases when Volsync is in use.

Currently VRG as Secondary is created when there are any PVCs protected
by Volsync. Once the issue of mixed workloads is addressed, a Secondary
VRG would exist for all cases, VolRep and sync DR setups as well. This
will ensure that in all cases the Secondary peerClasses are a copy of
the Primary.

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
Use sorted list, one item per line. This way it is easy to add or remove
packages.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Lima 1.0.0 was released[1], and it includes everything we need. Update
the docs to install lima from brew instead of source.

[1] https://github.com/lima-vm/lima/releases/tag/v1.0.0

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
add storageID, replicationID to StorageClass, replicationClasses
and snapshotClasses. this is to faciltate filtering of
multiple storageClasses

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
It was added in failed state (Status=False, Reason=PrerequisiteNotMet)
by default which is wrong. The issue was hidden since we check the
condition only when deleting the VRG. We want to inspect the condition
when the when the VRG is live, so we can report the condition status in
the protected pvcs conditions.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Based on the messages addded in
csi-addons/kubernetes-csi-addons#691. We want to
propagate the error messages to the protected pvcs conditions.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
When a VR condition is not met, we set the protected PVC condition
message using the error message returned from isVRConditionMet(). When
using csi-addons > 0.10.0, we use now the message from the condition
instead of the default message.

Since the Validated condition is not reported by older version of
csi-addons, and we must wait until the Validated condition status is
known when VRG is deleted, isVRConditionMet() returns now also the state
of the condition, which can be:

- missing: condition not found
- stale: observed generation does not match object generation
- unknown: the special "Unknown" value
- known: status is True or False

When we validate the Validate condition we have these cases:

- Condition is missing: continue to next condition.

- Condition is met: continue to the next condition.

- Condition not met and its status is False. This VR will never
  complete and it is safe to delete since replication will never start.
  If VRG is deleted, we return true since the VR reached the designed
  state. Otherwise we return false. In this case we updated the
  protected pvc condition with the message from the VR condition.

- Condition is not met and is stale or unnown: we need to check again
  later. There is no point to check the completed condition since a VR
  cannot complete without validation.In this case we updated the
  protected pvc condition with the message generated by
  isVRConditionMet() for stale or unknown conditions.

Example protected pvc DataReady condition with propagated message when
VR validation failed:

    conditions:
      - lastTransitionTime: "2024-11-06T15:33:06Z"
        message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition
          desc = system is not in a state required for the operation''s execution:
          failed to enable mirroring on image "replicapool/csi-vol-fe2ca7f8-713c-4c51-bf52-0d4b2c11d329":
          parent image "replicapool/csi-snap-e2114105-b451-469b-ad97-eb3cbe2af54e"
          is not enabled for mirroring'
        observedGeneration: 1
        reason: Error
        status: "False"
        type: DataReady

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Verify that when VR Validated condition is False, the condition message
is propagated to the protected pvc DataReady condition message.

To make this easy to test, we have a new helper for waiting until
protected pvc condition status and message are updated to specified
values.

We propagate the same message to the DataProtected condition, but this
behavior seems like unwanted behavior that should change and is not
worth testing.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
improve filtering replicationClass

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
add StorageClass Labels, ReplicationClass Lables
and PeerClasses

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
add StorgeClass Label,  VolumeSnapshotClass Labels
and PeerClasses

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
adding tests with no peerClasses and no replicaitonID
in VRC

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
skip filtering using sid with replicationClass if peerclass
is not found

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Co-authored-by: Shyamsundar Ranganathan <srangana@redhat.com>
Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Updated documentation step "drenv setup" as it lead to following error:

    $ drenv setup
    usage: drenv setup [-h] [-v] [--name-prefix PREFIX] filename
    drenv setup: error: the following arguments are required: filename

Signed-off-by: pruthvitd <prd@redhat.com>
…resource

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
…cross relevant components

Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Update test/envs/vm.yaml with 2 cpu count to resolve the following error:
	$ drenv start envs/vm.yaml
	2024-11-11 01:35:05,809 INFO    [vm] Starting environment
	2024-11-11 01:35:05,943 INFO    [cluster] Starting minikube cluster
	2024-11-11 01:35:06,330 ERROR   Command failed
	:
	  File "/home/ramenuser/ramen/test/drenv/commands.py", line 207, in watch
	    raise Error(args, error, exitcode=p.returncode)
	drenv.commands.Error: Command failed:
	   command: ('minikube', 'start', '--profile', 'cluster', '--driver', 'kvm2', '--container-runtime', 'containerd', '--disk-size', '20g', '--nodes', '1', '--cni', 'auto', '--cpus', '1', '--memory', '2g', '--extra-config', 'kubelet.serialize-image-pulls=false', '--insecure-registry=host.minikube.internal:5000')
	   exitcode: 29
	   error:
	      X Exiting due to RSRC_INSUFFICIENT_CORES: Requested cpu count 1 is less than the minimum allowed of 2

Signed-off-by: pruthvitd <prd@redhat.com>
1. Add check hook implementation during backup
2. Add check hook implementation during restore
3. Add roles required for reading deployments and statefulsets
4. Add timeout operation for check hook

Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com>
Signed-off-by: Annaraya Narasagond <annarayanarasagond@gmail.com> Annaraya Narasagond <annarayanarasagond@gmail.com>
raghavendra-talur and others added 30 commits February 13, 2025 06:41
This is not the right place to protect the kube resources as one final
sync when we are relocating. It is too late as the user has already been
informed to cleanup the resources and some of the resources might have
been deleted already.

The right time to do it is before the application resource cleanup but
we cannot do it right now because of the difference in how volrep and
volsync behave.

We will fix it in a future release.

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Move/update info logs to debug in EnableProtection DisableProtection for managed
and discovered apps

Signed-off-by: Parikshith <pbyregow@redhat.com>
Add debug logs for Create and delete ManagedClusterSetBinding,
createPlacementManagedByRamen, CreatePlacement and DeletePlacement

Signed-off-by: Parikshith <pbyregow@redhat.com>
Add debug logs in create, delete, generate and wait drpc funcs

Signed-off-by: Parikshith <pbyregow@redhat.com>
Replace zapcore.AddSync with zapcore.Lock for safe concurrent writes to logfile and console.

Signed-off-by: Parikshith <pbyregow@redhat.com>
*The 'enforce-err-cuddling' property under 'linters-settings.wsl' is not allowed in the latest jsonschema for golangci-lint. Replace with 'force-err-cuddling'.
*Changing to 'force-err-cuddling' revealed existing lint errors wrt: if statements that check an error must be cuddled with the statement that assigned the error. Fixed all revealed lint issues.

Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Oded Viner <oviner@redhat.com>
- Moved functions responsible for retrieving the current cluster from dractions/retry.go to util/placement.go
- Moved getPlacement from dractions/crug.go to util/placement.go
- Refactored getTargetCluster to get drPolicy and also changed to pass currentCluster as argument.
- Made GetCurrentCluster(added doc) and GetPlacement functions public.
- Updated references to use the new public functions.

Signed-off-by: Parikshith <pbyregow@redhat.com>
In the referenced PRs [1] and [2], two StorageClasses were
created for RBD and two for CephFS. Both RBD and CephFS
StorageClasses had duplicate storageIDs. This commit resolves
the duplicate storageID issue.

References:
[1] RamenDR#1756
[2] RamenDR#1770

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
-Add debug log after namespace creation
-Add debug log after adding namespace annotations
 and a private constant for the volsync namespace annotation
-Update info to debug log in DeleteNamespace

Signed-off-by: Parikshith <pbyregow@redhat.com>
With this command, the test would fail without the fix:
cmd: # ginkgo -focus="When no running pod is mounting the PVC to be protected" -repeat=10

With this fix the result is consistently succeeding.
Here is the result of 100 attempts:

Ran 1 of 56 Specs in 12.390 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped
PASS

All tests passed...
This was attempt 100 of 101.
Running Suite: Volsync Suite - /root/DR/ut-fix/ramen/internal/controller/volsync
================================================================================
Random Seed: 1739983377

Will run 1 of 56 specs
SSSSSSS•SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

Ran 1 of 56 Specs in 11.616 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 55 Skipped
PASS

Ginkgo ran 1 suite in 19m32.890356929s
Test Suite Passed

Signed-off-by: pruthvitd <prd@redhat.com>
NetworkFence resources are created with
the prefix "network-fence-". This prefix
needs to be appended in NF-MCV for the
lookup to be successful.

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Update info logs in deploy/undeploy functions for appset, subscription
and discoverd apps in deployers package to log clustername of the workload.

Signed-off-by: Parikshith <pbyregow@redhat.com>
This change adds a new Makefile target `lint-config-verify`,
which runs:"golangci-lint config verify --config=./.golangci.yaml"
locally detect issues in `.golangci.yaml`. This allows to detect
issues with the config file quickly without running the entire ci.
Additionally, `lint` now depends on `lint-config-verify` to ensure
the config is always valid before running lint checks.

Signed-off-by: Parikshith <pbyregow@redhat.com>
…ntrol Reconciler

Signed-off-by: Oded Viner <oviner@redhat.com>
Earlier, a failure in hook execution was wrongly exiting out of the for
loop and treating the backup process as success.

We return and come back later to continue processing.

Signed-off-by: Raghavendra Talur <raghavendra.talur@gmail.com>
Convert Info logs to Debug logs for functions in the deployers
and workload packages.Add additional debug logs at the end of
resource creation and deletion and remove logs before operations
when we log the operation in helper functions. Update debug logs
by adding more information, such as: resource names, namespace etc.

Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Parikshith <pbyregow@redhat.com>
- Ignore Ginkgo test report file
- Ignore placeholder CRD file

Signed-off-by: rakeshgm <rakeshgm@redhat.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
Signed-off-by: Elena Gershkovich <elenage@il.ibm.com>
This change enables logging cluster names in the next step, improving debugging
and log readability by clearly identifying the cluster in log messages for
different operations.

- Updated Cluster struct to include Name field.
- Added `ClusterConfig` struct for cluster properties: name and kubeconfigpath.
- Added default cluster names constants
- Updated `NewContext` to ensure each cluster gets a default name if the `name`
  field is missing or empty in the config.
- Updated comment in config.yaml.sample & added name property for all clusters.
- Updated drenv/ramen.py to include cluster names along with kubeconfigpaths

  Generated drenv config:

      clusters:
        hub:
          name: hub
          kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/hub
        c1:
          name: dr1
          kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr1
        c2:
          name: dr2
          kubeconfigpath: /home/pari/.config/drenv/rdr/kubeconfigs/dr2

Signed-off-by: Parikshith <pbyregow@redhat.com>
Refactored functions in the utils package to replace the `client client.Client`
parameter with `cluster Cluster`. Updated client calls to `cluster.Client`.
Also modified caller functions to pass cluster instead of client.

Signed-off-by: Parikshith <pbyregow@redhat.com>
Refactored functions in the dractions package to replace the `client client.Client`
parameter with `cluster Cluster`. Updated client calls to `cluster.Client`.
Also modified caller functions to pass cluster instead of client.

Signed-off-by: Parikshith <pbyregow@redhat.com>
Updated `getDRClusterClient` to `getDRCluster` to return `util.Cluster`
instead of `client.Client`. Refactored functions in the deployers, types
and workloads package to replace the `client client.Client` parameter
with `cluster Cluster`. Updated client calls to `cluster.Client`. Also
modified caller functions to pass cluster instead of client.

Signed-off-by: Parikshith <pbyregow@redhat.com>
Signed-off-by: Benamar Mekhissi <bmekhiss@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.