Open
Conversation
…testing
{AH} import paths via TestUtils
…x-instantiate-records Ah faidx instantiate records
…linking-tests Ah add linking tests
…bix_add_line_skip
{AH} add tabix line_skip, fixes pysam-developers#487. Remove pileup preset.
…am-developers#513) ## Background `bcf_translate` works correctly on newly created or loaded BCF records, but it does not properly account for records which have been modified by having one or more INFO fields deleted. ## Nature of fix Prior to attempt to update the info value (either in place or after resizing), update the entry's key id (to keep all ids valid) and then check whether `line->d.info[i].vptr` is `NULL`. This indicates that the key and value have been deleted. Otherwise, `vptr` is incorrectly assumed to point to a valid value and a segmentation fault results. ## Other issues not addressed in this PR `bcf_translate` uses a data structure stored within the source BCF header data structure. This is a poor design decision, because it presumes that any header will be used to translate to no more than one destination header. Worse, there is no check to verify that the same destination header is used or that the source and destination have not changed since the translation table was initialized. Both requirements are unchecked and can cause crashes or write corrupted data. There is no good reason that the translation data structure is part of the header, except that it would require breaking ABI compatibility. This suggests adding a `bcf_translate2` function that takes an explicit translation table as a parameter with clear documentation on the assumptions and lifetime of the translation struct. ## Notes This fix was also submitted upstream to htslib. See [PR pysam-developers#568](samtools/htslib#568)
## Scope Running tests with nose would use the `tests` directory as CWD, so test file paths were all relative to the `tests` directory. In contrast, pytest runs from the `pysam` base directory as CWD, so all test file file paths must be updated. Prior changes were sufficient to ensure that existing test files were found in the correct locations, but many temporary files were created in CWD, rather than the `tests` directory. ## Limitations - There are still a few temp files not created in `tests`, including the 1kGenomes index file. - In future, all temporary files should probably be written to a dedicated directory that can be removed after the test suite is run.
Doing so exposed a bifurcation in terminology. AlignmentFile uses `reference` and `end`, where as VariantFile is careful to always use `contig` and `stop`. Although there is more code in the wild using AlignmentFile, I *strongly* prefer the latter nomenclature and updated all of the code to use that. However, extra keyword arguments were added to enable backward compatible use of `reference` and `end`. I understand this change may be a bit controversial and I am happy to discuss before merging this PR.
…r-parse-region
## Scope
* Update glossary entry for CIGAR
* Correct error return in pysam_bam_update
* Remove incorrect use of inline declarations in external Cython definitions
* Refactor region_parsing and related code to HTSFile base class.
Doing so exposed a bifurcation in terminology. `AlignmentFile` uses
`reference` and `end`, where as `VariantFile` is careful to always use
`contig` and `stop`. Although there is more code in the wild using
`AlignmentFile`, I *strongly* prefer the latter nomenclature and updated all
of the code to use that. However, extra keyword arguments were added to
enable backward compatible use of `reference` and `end`.
…_intron Speed up find intron
Adding a warning to count_coverage when an alignment has an empty QUAL field
* Allow setting number of threads for htslib
Internally this uses hts_set_threads. There is only a minor
speedup for reading files, but writing files can benefit well
from setting n_threads to more than 1:
```
In [1]: def benchmark_write(threads=1):
...: source = pysam.AlignmentFile('/Users/mvandenb/src/readtagger/tests/h4.bam', threads=threads)
...: out = pysam.AlignmentFile('out.bam', mode='wb', template=source, threads=threads)
...: for _ in range(1000000):
...: out.write(next(source))
...:
In [2]: %timeit benchmark_write(threads=1)
1 loop, best of 3: 9.21 s per loop
In [3]: %timeit benchmark_write(threads=2)
1 loop, best of 3: 4.68 s per loop
In [4]: %timeit benchmark_write(threads=3)
1 loop, best of 3: 3.17 s per loop
In [5]: %timeit benchmark_write(threads=4)
1 loop, best of 3: 2.66 s per loop
```
This had been discussed in pysam-developers#579 (comment).
* Avoid combining n_threads > 1 w/ ignore_truncation
Apparently htslib won't raise a separate error if
hitting a truncated read. Also subtracts 1 from `n_threads`,
since `hts_set_threads` will add additional threads.
This should be more intuitive to users.
Also moves `hts_set_threads` as the last action when cerating
new AlignmentFile instances.
* Rename `n_thread` to `thread`
* Move hts_set_threads to HTSFile class
* Implement threads for pysam.VariantFile
* Implement threads for pysam.TabixFile
Upgrade to Pysam 0.11.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Notices:
bgzf.candhfile_libcurl.cremain as originally implemented inpd3/htslib/crypto