Profiling #260

jhiemstrawisc · 2025-06-09T21:30:35Z

No description provided.

tristan-f-r

Looks good! Almost every comment is a nitpick, except for the test comment - something to verify (not for correctness) that profiling doesn't suddenly error would be reassuring.

spras/containers.py

tristan-f-r · 2025-06-20T16:15:36Z

[Apologies for the force pushes - I was trying to fix the merge conflict's behavior on run_container_and_log and added unnecessary fixes twice. Since PRs are merged, I didn't want this PR to be littered with commits of this.]

config/config.yaml

jhiemstrawisc · 2025-06-26T18:44:18Z

One additional note when it comes time to verify this unit of work in the CHTC pool -- the tester should add the following constraint to their submit file (if running everything on one EP):

requirements = versionGE(split(Target.CondorVersion)[1], "24.8.0") && (isenforcingdiskusage =!= true)

or this line under the default-resources section of their snakemake profile (if splitting work across EPs):

requirements: "'versionGE(split(Target.CondorVersion)[1], \"24.8.0\") && (isenforcingdiskusage =!= true)'"

This pins jobs to execution points that a) run the minimal required version of condor and b) don't enable another feature (disk enforcement) that doesn't play well with what profiling does to cgroups.

tristan-f-r · 2025-06-26T18:54:55Z

A question out of curiosity: what exactly happens if isenforcingdiskusage is set to true? All I get from the readthedocs is:

A boolean value that when True identifies that the machine is setup to enforce disk usage limits for each job the machine executes.

How does the created cgroup sibling mess with the way that this isenforcingdiskusage flag works? My initial assumption was that isenforcingdiskusage is somehow using a parent cgroup to monitor disk usage, but I don't remember cgroups ever being capable of that.

jhiemstrawisc · 2025-06-27T15:05:57Z

A question out of curiosity: what exactly happens if isenforcingdiskusage is set to true? All I get from the readthedocs is:

A boolean value that when True identifies that the machine is setup to enforce disk usage limits for each job the machine executes.

How does the created cgroup sibling mess with the way that this isenforcingdiskusage flag works? My initial assumption was that isenforcingdiskusage is somehow using a parent cgroup to monitor disk usage, but I don't remember cgroups ever being capable of that.

I don't fully recall -- this is something the HTCondor developer who most heavily works on their cgroup management said would be needed. I suspect it has something to do with the logical volume mount in the outer container hiding the cgroup tree or making it unwritable.

spras/profiling.py

tristan-f-r · 2025-07-18T17:00:47Z

This created a bad merge conflict with #283. I would like to resolve it myself, but I don't know how to test the profiling changes here well enough to guarantee that I didn't miss anything.

agitter · 2025-08-01T15:15:59Z

This created a bad merge conflict with #283.

@jhiemstrawisc are you able to check out this merge conflict at some point?

agitter

I took a first pass through the code. I would still like to try running it myself and will then comment again.

Since this initial pull request, we now have new pathway reconstruction algorithms to support.

spras/cgroup_wrapper.sh

spras/config.py

spras/containers.py

spras/profiling.py

This commit adds the needed bits for the main Python process to create a peer cgroup (linux only) such that when profiling is enabled, the PRM containers are run under this cgroup with the `memory.peak` and `cpu.stat` controllers enabled. Unfortunately we can't just point Python at some PID, because the PRM containers launch various processes without reporting the PIDs back to the originating process. This prevents us from regular inline monitoring.

read-the-docs-community · 2025-09-18T15:23:50Z

Documentation build overview

📚 spras | 🛠️ Build #29760065 | 📁 Comparing b454fa4 against latest (d4cbe34)

🔍 Preview build

Show files changed (4 files in total): 📝 4 modified | ➕ 0 added | ➖ 0 deleted

File	Status
genindex.html	📝 modified
install.html	📝 modified
fordevs/modules.html	📝 modified
fordevs/spras.html	📝 modified

jhiemstrawisc · 2025-09-18T15:24:54Z

@agitter one last thing I'd like your input on is where to document what you brought up in the remaining unresolved comment.

On a side note, I ran another round of manual tests in CHTC's HTCondor pool to double check nothing broke after cleaning up all the merge conflicts. Looks like everything still works!

tristan-f-r

RWR and ST_RWR are missing the out_dir param in run_container_and_log, which is causing CI to fail. (I can fix and commit that change to this branch if that's okay.)

config/config.yaml

agitter · 2025-09-21T02:36:11Z

ResponseNet is also missing out_dir

@tristan-f-r did you already re-review after Justin resolved merge conflicts or should I?

tristan-f-r · 2025-09-21T18:54:17Z

I'll do another pass 👍

tristan-f-r

This still works. The merge conflict resolution introduced a double print of the container stdout (see review comment below), but this otherwise still seems fine 👍

spras/containers.py

tristan-f-r · 2025-09-21T19:03:39Z

spras/profiling.py

+    mycgroup = os.path.join("/sys/fs/cgroup", cgroup_rel.lstrip("/"))
+    peer_cgroup = os.path.join(os.path.dirname(mycgroup), f"spras-peer-{os.getpid()}")
+
+    # Create the peer cgroup directory
+    try:
+        os.makedirs(peer_cgroup, exist_ok=True)


Do we avoid pathlib here on purpose?

I see this unresolved comment but am likely to approve anyway to get this merged. We may not be consistent with pathlib throughout the code base even though I agree it would be better to be.

spras/containers.py

spras/profiling.py

…RWR,STRWR These were PRMs whose `run_container` arguments were missed when I was updating everything to pass an output dir around

The removed functions are in `profiling.py` and should have been removed from `containers.py`. This also restores a comment that was removed while fixing merge conflicts.

jhiemstrawisc · 2025-09-29T19:20:04Z

I think I finally cleaned up the various sources of CI failures.

On a side note, something's up with pre-commit. I deleted/re-installed hooks after a rebased against main and I'm still getting this error:

$ pre-commit clean
Cleaned /Users/jhiemstra/.cache/pre-commit.

$ rm -rf .git/hooks/pre-commit

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

$ git commit -m "foo"
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Initializing environment for https://github.com/charliermarsh/ruff-pre-commit.
[INFO] Initializing environment for https://github.com/google/yamlfmt.
[INFO] Initializing environment for https://github.com/crate-ci/typos.
An error has occurred: InvalidManifestError: 
==> File /Users/jhiemstra/.cache/pre-commit/repowrk_lwka/.pre-commit-hooks.yaml
==> At Hook(id='typos')
==> At key: stages
==> At index 0
=====> Expected one of commit, commit-msg, manual, merge-commit, post-checkout, post-commit, post-merge, post-rewrite, prepare-commit-msg, push but got: 'pre-commit'
Check the log at /Users/jhiemstra/.cache/pre-commit/pre-commit.log

This has been going on for awhile, so I started authoring all commits with --no-verify, which is probably not a great long-term solution.

tristan-f-r · 2025-09-30T02:29:07Z

pre-commit needs to be updated in your conda environment: pre-commit changed their configuration layout (either on our side or for hooks - I'm not quite sure). Regardless, a conda env update should fix that.

tristan-f-r

(I would still like to see pathlib usage in profiling.py, but this looks good! 👍)

agitter

I have not run this yet but all major comments have been addressed.

jhiemstrawisc requested a review from agitter June 10, 2025 17:37

tristan-f-r reviewed Jun 10, 2025

View reviewed changes

spras/containers.py Outdated Show resolved Hide resolved

spras/containers.py Outdated Show resolved Hide resolved

spras/containers.py Show resolved Hide resolved

spras/containers.py Show resolved Hide resolved

tristan-f-r added the performance issues related to runtime label Jun 17, 2025

tristan-f-r force-pushed the profiling branch from 13f1df8 to e28d973 Compare June 20, 2025 16:13

tristan-f-r force-pushed the profiling branch 2 times, most recently from 9dba53e to 2435903 Compare June 20, 2025 17:05

tristan-f-r reviewed Jun 23, 2025

View reviewed changes

config/config.yaml Show resolved Hide resolved

ntalluri added needed for benchmarking Priority PRs needed for the benchmarking paper labels Jun 25, 2025

ntalluri reviewed Jul 16, 2025

View reviewed changes

spras/profiling.py Show resolved Hide resolved

agitter reviewed Aug 1, 2025

View reviewed changes

spras/cgroup_wrapper.sh Show resolved Hide resolved

spras/config.py Show resolved Hide resolved

spras/containers.py Outdated Show resolved Hide resolved

spras/profiling.py Show resolved Hide resolved

spras/profiling.py Show resolved Hide resolved

jhiemstrawisc and others added 10 commits September 16, 2025 15:52

Allow 'apptainer' as singularity alias

9f1e7fa

Create knob that adds profiling flag to config object

115a6a1

Package new cgroup wrapper

b3bc8ff

Fix minor things

1598a43

Restore docker:// prefix for remote containers

ed92219

fix(run_container): support paths

11c8e63

Move apptainer profiling functions to separate file

79b6688

Restore some comments about apptainer image unpacking

be6adb1

Incorporate review feedback

29f0001

jhiemstrawisc force-pushed the profiling branch from bdcd4d5 to 29f0001 Compare September 18, 2025 15:22

jhiemstrawisc requested a review from agitter September 18, 2025 15:25

tristan-f-r reviewed Sep 18, 2025

View reviewed changes

agitter reviewed Sep 21, 2025

View reviewed changes

config/config.yaml Show resolved Hide resolved

tristan-f-r requested changes Sep 21, 2025

View reviewed changes

tristan-f-r added the awaiting-author Author of the PR needs to fix something from a review / etc. label Sep 25, 2025

jhiemstrawisc added 6 commits September 29, 2025 13:08

Add outdir and switch run_container-->run_container_and_log for RNet,…

c5c3529

…RWR,STRWR These were PRMs whose `run_container` arguments were missed when I was updating everything to pass an output dir around

Fixup artifacts from rebase

f186659

The removed functions are in `profiling.py` and should have been removed from `containers.py`. This also restores a comment that was removed while fixing merge conflicts.

Capture container stderr when profiling w/ Apptainer

e86f959

Add note about profiling/HTCondor version requirement

d0cad77

Add out_dir to bowtiebuilder

4e8a080

Fix switched work_dir/out_dir in DOMINO

b454fa4

jhiemstrawisc requested review from agitter and tristan-f-r September 29, 2025 19:20

tristan-f-r approved these changes Sep 30, 2025

View reviewed changes

tristan-f-r removed the awaiting-author Author of the PR needs to fix something from a review / etc. label Sep 30, 2025

agitter approved these changes Oct 3, 2025

View reviewed changes

agitter merged commit 0af98f2 into Reed-CompBio:main Oct 3, 2025
18 checks passed

tristan-f-r mentioned this pull request Nov 1, 2025

refactor: broaden container settings args #390

Merged

1 task

Profiling #260

Profiling #260

Uh oh!

Conversation

jhiemstrawisc commented Jun 9, 2025

Uh oh!

tristan-f-r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tristan-f-r commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jhiemstrawisc commented Jun 26, 2025

Uh oh!

tristan-f-r commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhiemstrawisc commented Jun 27, 2025

Uh oh!

Uh oh!

tristan-f-r commented Jul 18, 2025

Uh oh!

agitter commented Aug 1, 2025

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

read-the-docs-community bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

jhiemstrawisc commented Sep 18, 2025

Uh oh!

tristan-f-r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agitter commented Sep 21, 2025

Uh oh!

tristan-f-r commented Sep 21, 2025

Uh oh!

tristan-f-r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tristan-f-r Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

agitter Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jhiemstrawisc commented Sep 29, 2025

Uh oh!

tristan-f-r commented Sep 30, 2025

Uh oh!

tristan-f-r left a comment

Choose a reason for hiding this comment

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

tristan-f-r commented Jun 20, 2025 •

edited

Loading

tristan-f-r commented Jun 26, 2025 •

edited

Loading

read-the-docs-community bot commented Sep 18, 2025 •

edited

Loading

tristan-f-r left a comment •

edited

Loading

tristan-f-r left a comment •

edited

Loading