-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Observed error:
Traceback (most recent call last):
File "/opt/laava/summarize_alignment.py", line 1032, in <module>
main(args)
File "/opt/laava/summarize_alignment.py", line 896, in main
subset_sam_by_readname_list(
File "/opt/laava/summarize_alignment.py", line 55, in subset_sam_by_readname_list
for row in csv.DictReader(per_read_f, delimiter="\t"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
gzip.BadGzipFile: Not a gzipped file (b're')
That's because the .gz is only applied when cpus > 1, and cpus=1 follows a different code path and skips the aggregation+gzip steps.
Potential solutions:
- Always use the multiprocessing path, even when cpus=1. (Least code, though inefficient.)
- Fix the downstream issue(s) individually by checking for .gz extensions. (Perpetuates the inconsistency.)
- Gzip the intermediate "chunks" as well, so that they are also valid .tsv.gz, and handle them correctly in the aggregation step when cpus>1. (Requires more code changes with little benefit.)
- Run gzip directly on the generated .tsv files when cpus=1. (Straightforward but requires more special-case code.)
The first option seems best because all this chunking and iteration deserves to be rewritten and having less code is better for that.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working