Skip to content

Conversation

@jhiemstrawisc
Copy link
Collaborator

No description provided.

@jhiemstrawisc jhiemstrawisc requested a review from agitter June 10, 2025 17:37
Copy link
Collaborator

@tristan-f-r tristan-f-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Almost every comment is a nitpick, except for the test comment - something to verify (not for correctness) that profiling doesn't suddenly error would be reassuring.

@tristan-f-r tristan-f-r added the performance issues related to runtime label Jun 17, 2025
@tristan-f-r
Copy link
Collaborator

tristan-f-r commented Jun 20, 2025

[Apologies for the force pushes - I was trying to fix the merge conflict's behavior on run_container_and_log and added unnecessary fixes twice. Since PRs are merged, I didn't want this PR to be littered with commits of this.]

@tristan-f-r tristan-f-r force-pushed the profiling branch 2 times, most recently from 9dba53e to 2435903 Compare June 20, 2025 17:05
@ntalluri ntalluri added needed for benchmarking Priority PRs needed for the benchmarking paper labels Jun 25, 2025
@jhiemstrawisc
Copy link
Collaborator Author

One additional note when it comes time to verify this unit of work in the CHTC pool -- the tester should add the following constraint to their submit file (if running everything on one EP):

requirements = versionGE(split(Target.CondorVersion)[1], "24.8.0") && (isenforcingdiskusage =!= true)

or this line under the default-resources section of their snakemake profile (if splitting work across EPs):

requirements: "'versionGE(split(Target.CondorVersion)[1], \"24.8.0\") && (isenforcingdiskusage =!= true)'"

This pins jobs to execution points that a) run the minimal required version of condor and b) don't enable another feature (disk enforcement) that doesn't play well with what profiling does to cgroups.

@tristan-f-r
Copy link
Collaborator

tristan-f-r commented Jun 26, 2025

A question out of curiosity: what exactly happens if isenforcingdiskusage is set to true? All I get from the readthedocs is:

A boolean value that when True identifies that the machine is setup to enforce disk usage limits for each job the machine executes.

How does the created cgroup sibling mess with the way that this isenforcingdiskusage flag works? My initial assumption was that isenforcingdiskusage is somehow using a parent cgroup to monitor disk usage, but I don't remember cgroups ever being capable of that.

@jhiemstrawisc
Copy link
Collaborator Author

A question out of curiosity: what exactly happens if isenforcingdiskusage is set to true? All I get from the readthedocs is:

A boolean value that when True identifies that the machine is setup to enforce disk usage limits for each job the machine executes.

How does the created cgroup sibling mess with the way that this isenforcingdiskusage flag works? My initial assumption was that isenforcingdiskusage is somehow using a parent cgroup to monitor disk usage, but I don't remember cgroups ever being capable of that.

I don't fully recall -- this is something the HTCondor developer who most heavily works on their cgroup management said would be needed. I suspect it has something to do with the logical volume mount in the outer container hiding the cgroup tree or making it unwritable.

@tristan-f-r
Copy link
Collaborator

This created a bad merge conflict with #283. I would like to resolve it myself, but I don't know how to test the profiling changes here well enough to guarantee that I didn't miss anything.

@agitter
Copy link
Collaborator

agitter commented Aug 1, 2025

This created a bad merge conflict with #283.

@jhiemstrawisc are you able to check out this merge conflict at some point?

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first pass through the code. I would still like to try running it myself and will then comment again.

Since this initial pull request, we now have new pathway reconstruction algorithms to support.

jhiemstrawisc and others added 10 commits September 16, 2025 15:52
This commit adds the needed bits for the main Python process to create
a peer cgroup (linux only) such that when profiling is enabled, the PRM
containers are run under this cgroup with the `memory.peak` and `cpu.stat`
controllers enabled.

Unfortunately we can't just point Python at some PID, because the PRM
containers launch various processes without reporting the PIDs back to
the originating process. This prevents us from regular inline monitoring.
@read-the-docs-community
Copy link

read-the-docs-community bot commented Sep 18, 2025

Documentation build overview

📚 spras | 🛠️ Build #29760065 | 📁 Comparing b454fa4 against latest (d4cbe34)


🔍 Preview build

Show files changed (4 files in total): 📝 4 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
install.html 📝 modified
fordevs/modules.html 📝 modified
fordevs/spras.html 📝 modified

@jhiemstrawisc
Copy link
Collaborator Author

@agitter one last thing I'd like your input on is where to document what you brought up in the remaining unresolved comment.

On a side note, I ran another round of manual tests in CHTC's HTCondor pool to double check nothing broke after cleaning up all the merge conflicts. Looks like everything still works!

Copy link
Collaborator

@tristan-f-r tristan-f-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RWR and ST_RWR are missing the out_dir param in run_container_and_log, which is causing CI to fail. (I can fix and commit that change to this branch if that's okay.)

@agitter
Copy link
Collaborator

agitter commented Sep 21, 2025

ResponseNet is also missing out_dir

@tristan-f-r did you already re-review after Justin resolved merge conflicts or should I?

@tristan-f-r
Copy link
Collaborator

I'll do another pass 👍

Copy link
Collaborator

@tristan-f-r tristan-f-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still works. The merge conflict resolution introduced a double print of the container stdout (see review comment below), but this otherwise still seems fine 👍

Comment on lines +23 to +28
mycgroup = os.path.join("/sys/fs/cgroup", cgroup_rel.lstrip("/"))
peer_cgroup = os.path.join(os.path.dirname(mycgroup), f"spras-peer-{os.getpid()}")

# Create the peer cgroup directory
try:
os.makedirs(peer_cgroup, exist_ok=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we avoid pathlib here on purpose?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this unresolved comment but am likely to approve anyway to get this merged. We may not be consistent with pathlib throughout the code base even though I agree it would be better to be.

@tristan-f-r tristan-f-r added the awaiting-author Author of the PR needs to fix something from a review / etc. label Sep 25, 2025
…RWR,STRWR

These were PRMs whose `run_container` arguments were missed when I was
updating everything to pass an output dir around
The removed functions are in `profiling.py` and should have been removed
from `containers.py`.

This also restores a comment that was removed while fixing merge conflicts.
@jhiemstrawisc
Copy link
Collaborator Author

I think I finally cleaned up the various sources of CI failures.

On a side note, something's up with pre-commit. I deleted/re-installed hooks after a rebased against main and I'm still getting this error:

$ pre-commit clean
Cleaned /Users/jhiemstra/.cache/pre-commit.

$ rm -rf .git/hooks/pre-commit

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

$ git commit -m "foo"
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/charliermarsh/ruff-pre-commit.
[INFO] Initializing environment for https://github.com/google/yamlfmt.
[INFO] Initializing environment for https://github.com/crate-ci/typos.
An error has occurred: InvalidManifestError: 
==> File /Users/jhiemstra/.cache/pre-commit/repowrk_lwka/.pre-commit-hooks.yaml
==> At Hook(id='typos')
==> At key: stages
==> At index 0
=====> Expected one of commit, commit-msg, manual, merge-commit, post-checkout, post-commit, post-merge, post-rewrite, prepare-commit-msg, push but got: 'pre-commit'
Check the log at /Users/jhiemstra/.cache/pre-commit/pre-commit.log

This has been going on for awhile, so I started authoring all commits with --no-verify, which is probably not a great long-term solution.

@tristan-f-r
Copy link
Collaborator

pre-commit needs to be updated in your conda environment: pre-commit changed their configuration layout (either on our side or for hooks - I'm not quite sure). Regardless, a conda env update should fix that.

Copy link
Collaborator

@tristan-f-r tristan-f-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would still like to see pathlib usage in profiling.py, but this looks good! 👍)

@tristan-f-r tristan-f-r removed the awaiting-author Author of the PR needs to fix something from a review / etc. label Sep 30, 2025
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not run this yet but all major comments have been addressed.

@agitter agitter merged commit 0af98f2 into Reed-CompBio:main Oct 3, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needed for benchmarking Priority PRs needed for the benchmarking paper performance issues related to runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants