Skip to content

Conversation

@gc00
Copy link
Collaborator

@gc00 gc00 commented Aug 15, 2025

@rajatpratapbisht , Please test this PR.
@karya0 , This is the issue that we described to you earlier. I think this is the least invasive fix.

The following problem can occur when using Slurm at HPC sites.

module load gcc-11.1.0
cd mana
./configure
make
# Then we logout, and then start a new session
module load gcc-8.1.0
cd mana
# User tries to run.
# Previously compiled dmtcp_coordinator has symbol with too high a version, in libstdc++.so
# This is because `ldd dmtcp/src/dmtcp_coordinator` uses LD_LIBRARY_PATH, which has now changed.
# So, the user re-configures and re-makes
./configure && make -j clean && make
# But `dmtcp/src/dmtcp_coordinator` and others still have wrong symbl versions.  They were not re-compiled

The solution here is to check the ${CC} version in the default make target and the clean target.

When we install a new MANA and 'make', how does MANA do 'make' on DMTCP? If that's clear, then we can simplify the code in the 'default' target. I'm too lazy right now to check that out.

@rajatpratapbisht ,
Could you test out this PR, and see if it works on Discovery?
Please test with changing modules, and also please test with a new clone of MANA.
Finally, could you look into when MANA automatically does 'make' on DMTCP? I'm still not entirely sure if the 'default' target is needed and correct.
Thanks.

@gc00 gc00 added the bug Something isn't working label Aug 15, 2025
@xuyao0127
Copy link
Collaborator

MANA uses the following commands to 'make' DMTCP

dmtcp: dmtcp/configure
	cd ${DMTCP_ROOT} && $(MAKE)
	cp -rf ${DMTCP_ROOT}/bin .
	cp -rf ${DMTCP_ROOT}/lib .

Copy link
Collaborator

@rajatpratapbisht rajatpratapbisht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running make clean failed. This led me to the request some changes. :)

@gc00 gc00 force-pushed the fix-mana-for-modules branch from 25dfa27 to 6b2e6f9 Compare August 22, 2025 20:22
@gc00
Copy link
Collaborator Author

gc00 commented Aug 22, 2025

@rajatpratapbisht , Thanks for the diagnosis and fixes. Sorry for my own delay, but I think I've fixed it now.

Could you please test this once again on Discovery (where the bug was originally found)?
Thanks!

@gc00
Copy link
Collaborator Author

gc00 commented Aug 27, 2025

@rajatpratapbisht , Ping. :-)

Makefile.in Outdated
@ cc_version="$$(${CC} --version | head -1 | tr -d '\n')" ; \
if test ! -e dmtcp/bin/dmtcp_coordinator || \
test ! -e dmtcp/config.log || \
! grep --quiet $${cc_version} dmtcp/config.log; then \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable $${cc_version} is should be "$${cc_version}"
Because of missing double-quotes, grep is misinterpreting gcc (GCC) 8.1.0 as individual files:

ERROR LOG:

** Warning: Building MANA can take longer on a compute node.
** Please build MANA on a login node.
grep: (GCC): No such file or directory
grep: 11.1.0: No such file or directory
make mana
...

i

Makefile.in Outdated
clean: tidy
@ cc_version="$$(${CC} --version | head -1 | tr -d '\n')" ; \
if test ! -e dmtcp/config.log || \
! grep --quiet $${cc_version} dmtcp/config.log; then \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above !
$${cc_version} -> "$${cc_version}"

@gc00 gc00 force-pushed the fix-mana-for-modules branch 3 times, most recently from 346dbe9 to 43b0259 Compare August 31, 2025 16:58
Makefile.in Outdated

# MANA configure does configure DMTCP. But no re-make in DMTCP. Check ${CC}.
default: display-build-env add-git-hooks mana_prereqs
@ cc_version="$$(${CC} --version | head -1 | tr -d '\n')" ; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is identical to clean, thus can be made into a smaller subcommand for internal use and better readability

@gc00 gc00 force-pushed the fix-mana-for-modules branch 5 times, most recently from e2dfe19 to a948b13 Compare August 31, 2025 17:22
@gc00
Copy link
Collaborator Author

gc00 commented Aug 31, 2025

@karya0,
We're still thinking about MANA and DMTCP in the case of an HPC cluster that can switch modules to a different compiler version.

Based on discussions with @rajatpratapbisht we still have one more issue with a cluster using modules for CC and CXX. The module may cause configure to define CC and CXX as a hard-coded absolute path. So, we configure first, and then switch modules. The user must then either re-configure everything (maybe that's the right solution) or else define CC=cc, CXX=c++ in the Makefile. (But if the user has environment variables CC or CXX, we always prefer that in the Makefile variable.) Let's think about the right solution.

However, note that on Discovery, gcc is in a path that is a link to a full pathname for a gcc compiler, and changing a module changes that executable. But /usr/bin/{gcc,cc} exists and is always gcc-4.8.0. And they may or may not set the env. vars CC and CXX. (They set it in the gcc-11 module, but not in the gcc-8 module.)

Maybe the solution is that in Makefile:default, we should test if the hardcoded CC variable in the Makefile is the same as the current env var (if set) or the current gcc (not the absolute path) if set, etc. If anything is inconsistent, then tell the user to re-configure and exit the Makefile. And using readlink -f on which cc or whatever, can be useful.

On further reflection, MANA should decide what is the current C/C++ compiler and then set the env var's CC and CXX before configuring DMTCP. Then DMTCP will be forced to do the right thing. And when switching modules, we will not only do 'make clean' in DMTCP, but also re-configure DMTCP with the current env var's for CC and CXX that we will choose. And for MANA, it's used on clusters, and so setting CC=gcc (or first try CC env var and then cc) should be enough for MANA.

@rajatpratapbisht
Copy link
Collaborator

rajatpratapbisht commented Aug 31, 2025

since ./configure is a shell script we can add at the start something like:
CC ?= cc if cc defined
CXX ?= c++ if c++ defined
CC ?= gcc
CXX ?= g++
And then export these env vars before configuring DMTCP, or before doing 'make clean; in DMTCP.

something like :

#!/bin/bash

: "${CC:=gcc}"   # set CC to gcc if not already defined

echo "$CC"

@gc00 gc00 force-pushed the fix-mana-for-modules branch from a948b13 to 6aaa560 Compare September 4, 2025 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants