new "parallel unix" backend for running jobs over multiple local processors #75
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I wrote this backend to enable local dumbo jobs to leverage multiple processor cores.
Minimal usage example, which will run 4 mappers in parallel and then run 4 reducers:
dumbo -input [infiles] -output [outpath] -punix yes -tmpdir [tmppath] -nmappers 4 -nreducers 4
Along the way, I also added a few additional options and features.
These new options are backend-agnostic:
These are specific to the new parallel unix backend: