Update "Install Slurm" documentation to leverage cloud-init by lunamorrow · Pull Request #82 · OpenCHAMI/openchami.org

lunamorrow · 2026-03-16T03:35:23Z

Pull Request Template

Thank you for your contribution! Please ensure the following before submitting:

Checklist

My code follows the style guidelines of this project
I have added/updated comments where needed
I have added tests that prove my fix is effective or my feature works
I have run make test (or equivalent) locally and all tests pass
DCO Sign-off: All commits are signed off (git commit -s) with my real name and email
REUSE Compliance:
- Each new/modified source file has SPDX copyright and license headers
- Any non-commentable files include a <filename>.license sidecar
- All referenced licenses are present in the LICENSES/ directory

Description

Updating/extending the "Install Slurm" documentation guide to leverage OpenCHAMI's cloud-init to make compute node configuration persistent across nodes and on reboot. See discussion/comments on PR #72.

Type of Change

Bug fix
New feature
Breaking change
Documentation update

For more info, see Contributing Guidelines.

…n will need some further updates to align better with the Tutorial (e.g. changing IP addresses, adjusting comments to support bare-metal and cloud setups, etc.) and to ensure the documented approach is sufficiently broad for general purpose. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

… for creating some files from cat to copy-paste to prevent issues with bash command/variable processing Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

… - this should make this guide easy to follow on with after the tutorial Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…Next step will be expanding comments/explanations to provide more context to users, as well as providing more code blocks to show expected output of commands that produce output. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…id. Changes include making it more clear when pwgen password is used, correcting the file creation step for slurm.conf to prevent errors, removing instructions for aliasing the build commend (and instead redirecting to the appropriate tutorial section), updating instructions inline with a recent PR to replace MinIO with Versity S3 and some minor typo fixes Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…ck from David. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…Some reviews are still pending as I figure out the source of the problem and a solution, and I will address these in a later commit. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

… to VM head nodes. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…certain commands shoudl behave and/or the output they should produce. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…ecurity vulnerabilities with versions 0.5-0.5.17 Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…ompute node. Additionally made some tweaks to the documentation to make the workflow more robust after repeating it on a fresh node. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…in a few places Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…erence to the 'Install Slurm' guide Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…t and the image config to reduce the number of commands needing to be run on the compute node. We are waiting on feedback from David and Alex before potentially implementing a more persistent Slurm configuration on the compute node/s. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…evon Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

… in the working directory '/opt/workdir' (as desired) and not the user's home directory Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…r' in the slurm-local.repo file Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…f slurm RPMs in '/opt/workdir' Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…ommand Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

… explanation that the SlurmctldHost must be 'head' instead of 'demo' when the head node is a VM Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…rrow/cloud-init-compute-node-slurm-config Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

…t so that compute node Slurm configuration is persistent across nodes and on reboot Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

lunamorrow · 2026-03-16T03:50:51Z

I have made some changes to the documentation to use cloud-init instead of manually configuring the compute node. This process also sets up NFS to mount shared files (e.g. Slurm configuration files) used by both the compute node and head node. The current commit only adds a basic compute node configuration (similar to what was already there, only with cloud-init now), but I am able to push up a more complex configuration which sets up LDAP and mounts the compute node with more memory for a more "realistic" Slurm setup. That way anyone who follows the guide will finish with a more production-ready Slurm configuration. Let me know what you think @synackd @davidallendj @alexlovelltroy

The merge I performed on this branch pulled in quite a lot of old commits which has clogged up this PR a bit, sorry about that!

content/docs/guides/install_slurm.md

davidallendj · 2026-03-17T20:14:17Z

content/docs/guides/install_slurm.md

+
+```bash
+sudo systemctl start slurmdbd
+sudo systemctl start slurmctld


I'm getting an error when trying to start slurmctld. I try starting with both SlurmctldHost=demo and SlurmctldHost=head and both give me the error.

[rocky@openchami-testing workdir]$ sudo systemctl start slurmdbd sudo systemctl start slurmctld Job for slurmctld.service failed because the control process exited with error code. See "systemctl status slurmctld.service" and "journalctl -xeu slurmctld.service" for details.

That is odd. Is your head node a VM or is it baremetal/cloud instance? Could you also please share the output of sudo systemctl status slurmctld and sudo journalctl -xeu slurmctld.service for me, so I can identify the root cause?

It's a cloud instance using JetStream 2. Here's what I have for journalctl -eu slurmctld.

Mar 17 19:38:35 openchami-testing.novalocal systemd[1]: Starting Slurm controller daemon... Mar 17 19:38:35 openchami-testing.novalocal slurmctld[547181]: slurmctld: error: If using PrologFlags=Contain for pam_slurm_adopt, proctrack/cgroup is required. If not using pam_slurm_adopt, please ignore error. Mar 17 19:38:35 openchami-testing.novalocal slurmctld[547181]: error: If using PrologFlags=Contain for pam_slurm_adopt, proctrack/cgroup is required. If not using pam_slurm_adopt, please ignore error. Mar 17 19:38:35 openchami-testing.novalocal slurmctld[547181]: slurmctld: error: Configured MailProg is invalid Mar 17 19:38:35 openchami-testing.novalocal slurmctld[547181]: slurmctld: slurmctld version 24.05.5 started on cluster demo Mar 17 19:38:35 openchami-testing.novalocal slurmctld[547181]: slurmctld: error: This host (openchami-testing/openchami-testing.novalocal) not a valid controller Mar 17 19:38:35 openchami-testing.novalocal systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Mar 17 19:38:35 openchami-testing.novalocal systemd[1]: slurmctld.service: Failed with result 'exit-code'. Mar 17 19:38:35 openchami-testing.novalocal systemd[1]: Failed to start Slurm controller daemon. Mar 17 19:40:04 openchami-testing.novalocal systemd[1]: Starting Slurm controller daemon... Mar 17 19:40:04 openchami-testing.novalocal slurmctld[547264]: slurmctld: error: If using PrologFlags=Contain for pam_slurm_adopt, proctrack/cgroup is required. If not using pam_slurm_adopt, please ignore error. Mar 17 19:40:04 openchami-testing.novalocal slurmctld[547264]: error: If using PrologFlags=Contain for pam_slurm_adopt, proctrack/cgroup is required. If not using pam_slurm_adopt, please ignore error. Mar 17 19:40:04 openchami-testing.novalocal slurmctld[547264]: slurmctld: error: Configured MailProg is invalid Mar 17 19:40:04 openchami-testing.novalocal slurmctld[547264]: slurmctld: slurmctld version 24.05.5 started on cluster demo Mar 17 19:40:04 openchami-testing.novalocal slurmctld[547264]: slurmctld: error: This host (openchami-testing/openchami-testing.novalocal) not a valid controller Mar 17 19:40:04 openchami-testing.novalocal systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Mar 17 19:40:04 openchami-testing.novalocal systemd[1]: slurmctld.service: Failed with result 'exit-code'. Mar 17 19:40:04 openchami-testing.novalocal systemd[1]: Failed to start Slurm controller daemon.

Alright it looks like your head node has a different hostname than the tutorial defined one. You'll just need to either update the head node hostname to demo.openchami.cluster or update SlurmctldHost in /etc/slurm/slurm.conf to your hostname openchami-testing

I had to change two things to get this to work:

SlurmctldHost=demo -> SlurmctldHost=openchami-testing (using demo.openchami.cluster didn't work)

ProctrackType=proctrack/linuxproc -> ProctrackType=proctrack/cgroup (produces another error in the logs journalctl -eu slurmctld)

SlurmctldHost=demo -> SlurmctldHost=openchami-testing (using demo.openchami.cluster didn't work)

Do you mean setting demo.openchami.cluster as the head node hostname, or setting SlurmctldHost to demo.openchami.cluster? The fix you did sounds good based on your hostname though.

ProctrackType=proctrack/linuxproc -> ProctrackType=proctrack/cgroup (produces another error in the logs journalctl -eu slurmctld)

I'll make a note of this to prevent any users being confused, as this error is safe to ignore.

Yes, that's exactly it.

…hown' command Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

content/docs/guides/install_slurm.md

davidallendj · 2026-03-23T21:09:03Z

content/docs/guides/install_slurm.md

+
+```bash
+# Try to munge and unmunge to access the compute node
+munge -n | ssh root@172.16.0.1 unmunge


I got an error here. Was munged supposed to already be started on the head node?

[root@de01 ~]# # Try to munge and unmunge to access the compute node munge -n | ssh root@172.16.0.1 unmunge munge: Error: Failed to access "/run/munge/munge.socket.2": No such file or directory (Did you start munged?)

Nevermind, I realized my mistake here. I tried running this on the compute node instead of the head node.

Okay, tried again on the head node and same error. Did I miss a step somewhere??

Yes the munge service should already be running on the head node, see line 249 for when it is enabled/started.

I double-checked this and munge is already running on the head node, but I'm still getting the same error.

[rocky@openchami-testing cloud-init]$ systemctl status munge ● munge.service - MUNGE authentication service Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; preset: disabled) Active: active (running) since Tue 2026-03-17 18:36:11 UTC; 6 days ago Docs: man:munged(8) Main PID: 534792 (munged) Tasks: 4 (limit: 189371) Memory: 1.7M (peak: 2.6M) CPU: 1.342s CGroup: /system.slice/munge.service └─534792 /usr/sbin/munged Mar 17 18:36:11 openchami-testing.novalocal systemd[1]: Starting MUNGE authentication service... Mar 17 18:36:11 openchami-testing.novalocal systemd[1]: Started MUNGE authentication service.

I can SSH into the node and run unmunge but it hangs. Not sure what's going on there.

I am going to run over my documentation again from scratch and see if I can reproduce this issue. Can you please let me know if the slurmctld service on the head node is reporting any errors when checked with systemctl status slurmctld and journalctl -xeu slurmctld? Particularly issues with contacting slurmd on the compute node. Additionally, can you please paste the output of the following commands for me to see?

munge -n munge -n | unmunge remunge sinfo

I can SSH into the node and run unmunge but it hangs. Not sure what's going on there.

Are you saying that you ssh into the compute node first, and then run unmunge, or are you running this as a single command as above (like munge -n | ssh root@172.16.0.1 unmunge)? Also, when it hangs, is there any terminal output that comes out before it starts hanging? If so, can you please share it with me?

It looks like slurmctld died and isn't restarting on it's own.

[rocky@openchami-testing coresmd]$ systemctl status slurmctld × slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; preset: disabled) Active: failed (Result: exit-code) since Fri 2026-03-27 17:05:33 UTC; 2min 8s ago Duration: 4ms Process: 1376516 ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE) Main PID: 1376516 (code=exited, status=1/FAILURE) CPU: 26ms Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: _open_persist_conn: failed to open persistent connection to host:localhost:6819: Connect> Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port> Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Association database appears down, reading from state file. Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Unable to get any information from the state file Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: fatal: slurmdbd and/or database must be up at slurmctld start time Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: slurmctld.service: Failed with result 'exit-code'.

Here's the journalctl -eu slurmctld logs:

Mar 23 20:10:21 openchami-testing.novalocal systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Mar 23 20:10:21 openchami-testing.novalocal systemd[1]: slurmctld.service: Failed with result 'exit-code'. Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: Starting Slurm controller daemon... Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Configured MailProg is invalid Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: slurmctld version 24.05.5 started on cluster demo Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: Started Slurm controller daemon. Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: _open_persist_conn: failed to open persistent connection to host:localhost:6819: Connect> Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port> Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Sending PersistInit msg: Connection refused Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Association database appears down, reading from state file. Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: error: Unable to get any information from the state file Mar 27 17:05:33 openchami-testing.novalocal slurmctld[1376516]: slurmctld: fatal: slurmdbd and/or database must be up at slurmctld start time Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE Mar 27 17:05:33 openchami-testing.novalocal systemd[1]: slurmctld.service: Failed with result 'exit-code'.

I installed s-nail to get rid of the MailProg error, but I don't think that's really the issue here.

sudo dnf install s-nail

Then, I had to update DbdHost=openchami-testing in /etc/slurm/slurmdbd.conf or else I get this error with slurmdbd.

Mar 23 20:05:56 openchami-testing.novalocal slurmdbd[1077476]: slurmdbd: pidfile not locked, assuming no running daemon Mar 23 20:05:56 openchami-testing.novalocal slurmdbd[1077476]: slurmdbd: Not running as root. Can't drop supplementary groups Mar 23 20:05:56 openchami-testing.novalocal slurmdbd[1077476]: slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 10.5.29-MariaDB Mar 23 20:05:56 openchami-testing.novalocal slurmdbd[1077476]: slurmdbd: accounting_storage/as_mysql: init: Accounting storage MYSQL plugin loaded Mar 23 20:05:56 openchami-testing.novalocal slurmdbd[1077476]: slurmdbd: fatal: This host not configured to run SlurmDBD ((openchami-testing or openchami-testing.novaloca> Mar 23 20:05:56 openchami-testing.novalocal systemd[1]: slurmdbd.service: Main process exited, code=exited, status=1/FAILURE

The fatal error goes away after I restart slurmdbd. I restart slurmctld as well afterwards and the first error seems to work after a restart as well.

[rocky@openchami-testing coresmd]$ journalctl -eu slurmctld Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: Could not open node state file /var/spool/slurmctld/node_state: No such file or directory Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: NOTE: Trying backup state save file. Information may be lost! Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: No node state file (/var/spool/slurmctld/node_state.old) to recover Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: Could not open job state file /var/spool/slurmctld/job_state: No such file or directory Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: NOTE: Trying backup state save file. Jobs may be lost! Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: No job state file (/var/spool/slurmctld/job_state.old) to recover Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: Could not open reservation state file /var/spool/slurmctld/resv_state: No such file or d> Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: NOTE: Trying backup state save file. Reservations may be lost Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: No reservation state file (/var/spool/slurmctld/resv_state.old) to recover Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: Could not open trigger state file /var/spool/slurmctld/trigger_state: No such file or di> Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: NOTE: Trying backup state save file. Triggers may be lost! Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: No trigger state file (/var/spool/slurmctld/trigger_state.old) to recover Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: read_slurm_conf: backup_controller not specified Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: Reinitializing job accounting state Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: accounting_storage/slurmdbd: acct_storage_p_flush_jobs_on_cluster: Ending any jobs in accountin> Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: select/cons_tres: select_p_reconfigure: select/cons_tres: reconfigure Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: select/cons_tres: part_data_create_array: select/cons_tres: preparing for 1 partitions Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: Running as primary controller Mar 27 18:24:42 openchami-testing.novalocal slurmctld[1495461]: slurmctld: error: No fed_mgr state file (/var/spool/slurmctld/fed_mgr_state) to recover lines 171-190/190 (END)

Finally, the results of commands you requested. Everything is testing here so no need to worry about keys and such.

[rocky@openchami-testing coresmd]$ munge -n MUNGE:AwQFAAC4trECqJKHBxLV2dkH5CUlw4BhT665sxYwV2P9YUgBaD29gmgys+Q2rPR2BTKPGvC1m+gbroVo6ApOS7a0z5qP0ahqdznBx1AmjYbCpXSlhQA27dOcd2mRgkoshcMH2YA=:

[rocky@openchami-testing coresmd]$ munge -n | unmunge STATUS: Success (0) ENCODE_HOST: openchami-testing.js2local (10.3.211.59) ENCODE_TIME: 2026-03-27 18:46:11 +0000 (1774637171) DECODE_TIME: 2026-03-27 18:46:11 +0000 (1774637171) TTL: 300 CIPHER: aes128 (4) MAC: sha256 (5) ZIP: none (0) UID: rocky (1000) GID: rocky (1000) LENGTH: 0

[rocky@openchami-testing coresmd]$ remunge 2026-03-27 18:47:46 Spawning 1 thread for encoding 2026-03-27 18:47:46 Processing credentials for 1 second 2026-03-27 18:47:47 Processed 10645 credentials in 1.000s (10642 creds/sec)

[rocky@openchami-testing coresmd]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 1 unk* de01

I had to go through the restart the openchami.target because everything died for some reason (probably certs expiring) and do the ochami stuff from section 1.5 again. I knew I had to do this again because ochami cloud-init group render compute x1000c0s0b0n0 produced no output.

After going through all this again, I was able to do sinfo and get the following output:

[testuser@openchami-testing ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 1 unk* de01

I made sure that munged was running before proceeding:

[rocky@openchami-testing cloud-init]$ sudo systemctl status munge ● munge.service - MUNGE authentication service Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; preset: disabled) Active: active (running) since Tue 2026-03-17 18:36:11 UTC; 1 week 3 days ago Docs: man:munged(8) Main PID: 534792 (munged) Tasks: 4 (limit: 189371) Memory: 1.8M (peak: 2.7M) CPU: 2.581s CGroup: /system.slice/munge.service └─534792 /usr/sbin/munged Mar 17 18:36:11 openchami-testing.novalocal systemd[1]: Starting MUNGE authentication service... Mar 17 18:36:11 openchami-testing.novalocal systemd[1]: Started MUNGE authentication service.

But now jobs are queued and it looks like they're hanging for some reason:

[testuser@openchami-testing ~]$ srun hostname srun: Required node not available (down, drained or reserved) srun: job 3 queued and waiting for resources

In summary, I think this particular issue may have been mostly related to have DbdHost=demo and not being set to DbdHost=openchami-testing in /etc/slurm/slurmdbd.conf like it is in /etc/slurm/slurm.conf. At this point, I feel like it might be a good idea for me to start completely over and make sure I didn't miss something. There are some moving parts I think with using JetStream 2 that I want to make sure I'm not overlooking and we get right here.

lunamorrow · 2026-03-24T02:03:12Z

As an aside, has someone updated the documentation formatting? All of the in-line code and code block headings are black in the Tutorial, which makes it impossible to read some of the documentation. It still appears the same as usual when I render it locally, but it has changed on https://openchami.org/docs/tutorial/

Locally rendered:

From OpenCHAMI website:

…nly fixing the name of the ACCESS and SECRET tokens for S3 and making a comment into a note to improve visibility Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

synackd · 2026-03-24T17:40:16Z

I wonder if the rendering issues were caused by the updates in #88. @alexlovelltroy?

synackd · 2026-03-24T17:45:05Z

It might more likely be #81.

davidallendj · 2026-03-27T19:58:16Z

Just a small nit-pick: line 1303 says "short-name": "nid" but it should be "short-name": "de" here.

lunamorrow added 22 commits February 9, 2026 15:16

Convert all relevant code blocks to bash format and adjust the method…

2b2e7a5

… for creating some files from cat to copy-paste to prevent issues with bash command/variable processing Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Update hostnames in tutorial to align with those used in the tutorial…

e64b50d

… - this should make this guide easy to follow on with after the tutorial Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Minor changes to 'Install Slurm' documentation based on review feedba…

b89210b

…ck from David. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Minor changes to formatting following feedback from David and Devon. …

3b1acbe

…Some reviews are still pending as I figure out the source of the problem and a solution, and I will address these in a later commit. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Add support for bare-metal and cloud instance head nodes, in addition…

5fb73fb

… to VM head nodes. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Include 'expected output' code blocks where useful to show users how …

e48ad49

…certain commands shoudl behave and/or the output they should produce. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Add support for installing version 0.5.18 of munge to prevent known s…

9831dbc

…ecurity vulnerabilities with versions 0.5-0.5.17 Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Update image build command to install munge version 0.5.18 into the c…

5adfb38

…ompute node. Additionally made some tweaks to the documentation to make the workflow more robust after repeating it on a fresh node. Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Update short hostname from 'head' to 'demo', as I missed this update …

43193db

…in a few places Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Update 'Choose Your Own Adventure' section of tutorial to include ref…

feb4058

…erence to the 'Install Slurm' guide Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Minor syntax changes and typo fixes to address review feedback from D…

28bf97a

…evon Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Correct missing argument to rpmbuild so that the munge RPMs are built…

8864394

… in the working directory '/opt/workdir' (as desired) and not the user's home directory Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Update baseurl filepath to new location of slurm RPMs in '/opt/workdi…

9358c4f

…r' in the slurm-local.repo file Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Mount slurm RPMs in Podmain container from the correct new location o…

dbc00aa

…f slurm RPMs in '/opt/workdir' Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Adjust path to boot image and update munge file/directory ownership c…

1a73589

…ommand Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Add tabs for slurm.conf file for different head node instances and an…

146a4e5

… explanation that the SlurmctldHost must be 'head' instead of 'demo' when the head node is a VM Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Merge branch 'main' of github.com:OpenCHAMI/openchami.org into lunamo…

dcbb037

…rrow/cloud-init-compute-node-slurm-config Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Initial update of 'Install Slurm' documentation to leverage cloud-ini…

9045a49

…t so that compute node Slurm configuration is persistent across nodes and on reboot Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

davidallendj reviewed Mar 17, 2026

View reviewed changes

content/docs/guides/install_slurm.md Outdated Show resolved Hide resolved

davidallendj reviewed Mar 17, 2026

View reviewed changes

content/docs/guides/install_slurm.md Show resolved Hide resolved

davidallendj reviewed Mar 17, 2026

View reviewed changes

Apply minor formatting suggestions from David and update incorrect 'c…

19909ce

…hown' command Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

davidallendj reviewed Mar 23, 2026

View reviewed changes

content/docs/guides/install_slurm.md Outdated Show resolved Hide resolved

davidallendj reviewed Mar 23, 2026

View reviewed changes

content/docs/guides/install_slurm.md Show resolved Hide resolved

davidallendj reviewed Mar 23, 2026

View reviewed changes

Update documentation with small tweaks from David's suggestions - mai…

dd15088

…nly fixing the name of the ACCESS and SECRET tokens for S3 and making a comment into a note to improve visibility Signed-off-by: Luna Morrow <luna.morrow2@gmail.com>

Conversation

lunamorrow commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template

Checklist

Description

Type of Change

Uh oh!

lunamorrow commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lunamorrow Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lunamorrow commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

synackd commented Mar 24, 2026

Uh oh!

synackd commented Mar 24, 2026

Uh oh!

davidallendj commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lunamorrow commented Mar 16, 2026 •

edited

Loading

lunamorrow Mar 18, 2026 •

edited

Loading

lunamorrow commented Mar 24, 2026 •

edited

Loading