Skip to content

Conversation

@liangxin1300
Copy link
Collaborator

@liangxin1300 liangxin1300 commented Nov 25, 2025

Problem

When setup qdevice via init and join commands:

  • crm cluster init --qnetd-hostname -y
  • crm cluster join -c <init_node> -y

The name of the qdevice-related certification directory under /etc/corosync/qdevice/net is different between nodes, which might cause problems when the node re-joins the cluster

Solution

  • Refactor bootstrap.retrieve_all_config_files function

    • Rename it to retrieve_data for reusability
    • Combine cpio with find to support retrieve directories recursively
  • On join node, retrieve qdevice data before starting qdevice (bsc#1254243)
    Remove function start_qdevice_on_join_node, no need to do certificate on join node for qdevice

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.65%. Comparing base (0c3bd0a) to head (8bb2c87).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
crmsh/bootstrap.py 90.90% 2 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
integration 55.09% <90.90%> (-0.06%) ⬇️
unit 52.81% <9.09%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crmsh/qdevice.py 97.85% <ø> (-0.19%) ⬇️
crmsh/bootstrap.py 87.86% <90.90%> (+0.01%) ⬆️

... and 6 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@liangxin1300 liangxin1300 force-pushed the 20251125_qdevice_issue branch 3 times, most recently from fddae99 to 626d8ce Compare November 26, 2025 06:20
- Rename it to retrieve_data for reusability
- Combine cpio with find to support retrieve directories recursively
@liangxin1300 liangxin1300 force-pushed the 20251125_qdevice_issue branch from 626d8ce to 332eca0 Compare November 26, 2025 06:54
@liangxin1300 liangxin1300 changed the title Fix: qdevice re-join issue Fix: bootstrap: On join node, retrieve qdevice config files before starting qdevice (bsc#1254243) Nov 26, 2025
@liangxin1300 liangxin1300 force-pushed the 20251125_qdevice_issue branch from b2cb065 to ae5fe39 Compare November 26, 2025 09:04
@liangxin1300 liangxin1300 force-pushed the 20251125_qdevice_issue branch from ae5fe39 to e58f46d Compare November 26, 2025 11:08
@liangxin1300 liangxin1300 force-pushed the 20251125_qdevice_issue branch from e58f46d to 8bb2c87 Compare December 4, 2025 11:00
@liangxin1300 liangxin1300 changed the title Fix: bootstrap: On join node, retrieve qdevice config files before starting qdevice (bsc#1254243) Fix: bootstrap: On join node, retrieve qdevice data before starting qdevice (bsc#1254243) Dec 4, 2025
@liangxin1300 liangxin1300 marked this pull request as ready for review December 8, 2025 03:16
Copy link
Collaborator

@nicholasyang2022 nicholasyang2022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of "Problem" and "Solution" is not so clear.

The name of the qdevice-related certification directory under /etc/corosync/qdevice/net is different between nodes, which might cause problems when the node re-joins the cluster.

How were they different? How did crmsh generate the directory name previously?

no need to do certificate on join node for qdevice

Why?

cmd = 'cpio -o << EOF\n{}\nEOF\n'.format(
'\n'.join((f for f in get_files_to_sync() if f != CSYNC2_KEY and f != CSYNC2_CFG))
)
def retrieve_data(from_node, data_list=None, data_type=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It would be more clear to name it retrieve_files instead of retrieve_data, as the word "data" is too general.
  2. And also file_list.
  3. data_type is only used for outputting messages and has no actual functions. Passing it makes the code harder to understand. It would be better to pass msg directly, or use a more clear parameter name to reflect its usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants