Skip to content

Modernize multiprocessing#181

Merged
gmloose merged 13 commits intomasterfrom
issue-180_modernize-multiprocessing
Sep 5, 2025
Merged

Modernize multiprocessing#181
gmloose merged 13 commits intomasterfrom
issue-180_modernize-multiprocessing

Conversation

@gmloose
Copy link
Collaborator

@gmloose gmloose commented Aug 12, 2025

This pull request implements the changes suggested in #180.

gmloose added 8 commits August 7, 2025 13:47
Sorted import statements, minor change in log message.
The wrong version of the updated PLOT operation was accidentally committed. This one should be OK.
Removed the now obsolete `multiproceManager` class. It has been replaced by `multiprocessing.Pool`
@gmloose gmloose marked this pull request as draft August 12, 2025 13:26
@gmloose gmloose self-assigned this Aug 12, 2025
@gmloose gmloose requested a review from darafferty August 12, 2025 13:26
@github-actions
Copy link

github-actions bot commented Aug 12, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4361 765 18% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
losoto/lib_operations.py 27% 🟢
losoto/operations/flag.py 4% 🟢
losoto/operations/flagextend.py 13% 🟢
losoto/operations/flagstation.py 2% 🟢
losoto/operations/plot.py 2% 🟢
losoto/operations/reweight.py 13% 🟢
TOTAL 10% 🟢

updated for commit: e763aac by action🐍

Two scripts that help in debugging/comparing the output of recently changed operations against a reference set.
Both sets need to be generated. First run `benchmark_all.sh` on the reference (e.g. the current `master`). Next run it on the changed operations in a different directory. Finally run `compare.sh` to compare the contents of the two directories.
The parset files contain inputs for each of the LoSoTo operations under test.

NOTE: This directory can be removed once everything works as expected.
TODO: Investigate how these tests can somehow be converted into regression tests (ran by `pytest`).
@gmloose
Copy link
Collaborator Author

gmloose commented Aug 12, 2025

Note for the reviewer(s): the debug directory contains stuff that I used to verify that the new implementation produces the same result as the old one. It doesn't need to be reviewed, nor does it need to be merged, though I welcome suggestions about how these tests could be turned into CI tests.

The file `compare.out` contains the difference between two runs.
Note that the run times may vary with a few seconds between runs.
The differences are not something to be worried about, though for very short processes (e.g. in the PLOT operation) you'll notice that the new implementation has a bit more overhead.
@tmillenaar
Copy link

Note for the reviewer(s): the debug directory contains stuff that I used to verify that the new implementation produces the same result as the old one. It doesn't need to be reviewed, nor does it need to be merged, though I welcome suggestions about how these tests could be turned into CI tests.

For fun I tried to run debug/benchmark_all.sh for myself but I don't have the required bandpass.input.h5. Is this a file I can download from somewhere?

return multiprocessing.cpu_count()


class multiprocManager(object):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite the nice cleanup :)

ncpu = ncpu if ncpu > 0 else nproc() # use all available CPUs if ncpu is not set
logging.debug("Using %s CPU(s) for operation FLAG.", ncpu)
with multiprocessing.Pool(ncpu) as pool:
results = pool.starmap(_flag, args)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in slack that the new pool implementation was a bit slower and you wondered if it was related to overhead of running the Pool. While that can be the case, I wonder if the difference was in the original queue returning results as soon as they were ready whereas the pool.starmap will return them in order once all are finished. We can test this by using imap_unordered instead of starmap.

How big were the performance differences you observed?

Copy link
Collaborator Author

@gmloose gmloose Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in slack that the new pool implementation was a bit slower and you wondered if it was related to overhead of running the Pool. While that can be the case, I wonder if the difference was in the original queue returning results as soon as they were ready whereas the pool.starmap will return them in order once all are finished. We can test this by using imap_unordered instead of starmap.

How big were the performance differences you observed?

Not that large. Have a look at debug/compare.out. And they can differ by a few seconds between runs. But overall the Pool seems to be slightly slower. I like your suggestion to use imap_unordered, I will try that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I cannot use imap_unordered as a drop-in replacement, because I need the starmap functionality. So, I would need something like starmap_unordered, which unfortunately doesn't exist. Work-around would be to write that wrapper myself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, let's stick with the starmap. The performance loss is small and it both simplifies and fixes the potential for haning queues so still pretty good all things considered.

@gmloose
Copy link
Collaborator Author

gmloose commented Aug 13, 2025

Note for the reviewer(s): the debug directory contains stuff that I used to verify that the new implementation produces the same result as the old one. It doesn't need to be reviewed, nor does it need to be merged, though I welcome suggestions about how these tests could be turned into CI tests.

For fun I tried to run debug/benchmark_all.sh for myself but I don't have the required bandpass.input.h5. Is this a file I can download from somewhere?

The input file is almost 1 GB in size. I can put it somewhere where you can download it, if you want. But this is indeed the problematic part of reproducing the results.

@gmloose gmloose marked this pull request as ready for review August 13, 2025 15:49
@gmloose gmloose requested review from tammojan and removed request for darafferty August 21, 2025 08:22
@gmloose gmloose requested review from darafferty and removed request for tammojan September 4, 2025 13:14
@darafferty
Copy link
Collaborator

Looks good to me! Out of curiosity, does the new implementation indeed solve the deadlock issues mentioned in #180? (I'm assuming it does, just wondering if you had a chance to check it.)

@gmloose
Copy link
Collaborator Author

gmloose commented Sep 5, 2025

Looks good to me! Out of curiosity, does the new implementation indeed solve the deadlock issues mentioned in #180? (I'm assuming it does, just wondering if you had a chance to check it.)

Yes it solves the dead-lock issue. You now get a clear error message from the child process if it dies due to an exception.

Removed the debug directory. It is no longer necessary and litters the project.
@gmloose gmloose merged commit 24b18b1 into master Sep 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants