Skip to content

Parallel Processing#1

Open
lsgos wants to merge 14 commits intoscienceguyrob:masterfrom
lsgos:master
Open

Parallel Processing#1
lsgos wants to merge 14 commits intoscienceguyrob:masterfrom
lsgos:master

Conversation

@lsgos
Copy link

@lsgos lsgos commented Feb 24, 2017

This PR adds parallel processing to the DataProcessor part of PulsarFeatureLab. Clearly processing multiple files is embarrassingly parallel, so this does not take much work. Using multiprocessing means the code inside DataProcessor needs to be refactored a bit, but since all the processing code is encapsulated in the candidate class no changes need to be made to this.
On a machine with multiple cores, the speedup given by this change is pretty good: processing 11,000,000 htru2 datafiles with 32 cores took almost 16 hours with the current code, whereas this fork reduces this to more like 2 and a half.
On the negative side, as it stands it makes the exception reporting slightly less good: it can report the file where the exception occurred and the type of exception, but I haven't implemented a way to print a full stack trace as before.
Also, since multiprocessing is new in python 2.6, this would raise the requirements for this program from python 2.4 to 2.6. Given that 2.7 is standard on most distributions these days this shouldn't be too much of a problem.
Finally, some functions had to be moved outside the class because multiprocessing is only able to deal with pickle-able objects. This is slightly less neat but I don't really think it hurts the clarity of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant