Releases: streetslab/dimelo-toolkit
v0.2.1
v0.2.1 contains several small updates to reflect the dimelo-toolkit at the time of the submission of the dimelo-toolkit manuscript, https://www.biorxiv.org/content/10.1101/2025.11.09.687458v1.full. The main updates from v0.2.0 are as follows:
environment.yml: change environment name to dimelo-toolkit
load_processed.read_vectors_from_hdf5: updates to support new single read browser plots that better show data
- Added
span_full_windowoption. If true, only reads that start before the region_start and end before the region_end are returned, i.e. they must be at least as long as the region. If false, maintain old behavior: all reads that end after the region_start and start before region_end are returned. - Added
read_lengthfield for sorting - Added sorting in
ascvsdescorder for each of the sequential sorting operations
parse_bam::read_by_base_txt_to_hdf5: updates to support single reads with long gaps, e.g. RNA with splicing
- When finding reads in the .txt file, instead of calculating read end position once when the read is first encountered (based on current position in genome + current position in read + read length), the read end position is re-calculated each time a new base is encountered for that read. Because the txt is ordered ascending along the genome, the last base encountered for a read will be the one closest to the end of that read. With e.g. megalodon bam files this always gives identical results to the old read end calculation method.
The new method still misses any read gaps that show up after the last potential modification site: this information will only be available with the modkit upgrade described here: nanoporetech/modkit#270
plot_enrichment_profile::plot_enrichment_profile: update to support profile plots with absolute rather than relative coordinates
plot_enrichment_profilemethods can take inrelativeflag; True means x-axis is centered around region centers, False means x-axis is absolute genome positionsmake_enrichment_profile_plotcan takeoffset_center, which gives a position offset to apply to the plot x-axis (e.g., when plotting absolute genome positions)
plot_read_browser::plot_read_browser: update to allow new plot customizations and make it work a bit better with gapped read e.g. RNA
- markers/lines/color palette settings
- pass down the span_full_window parameter to the loader
- read_extent_df now keeps the longest version of each read name, because until we have the new modkit version there are different entries for each mod type for each read, and those each have lengths calculated by the method above which are not guaranteed to be the same between mods
utils::DEFAULT_COLORS: add new ones
utils::ParsedMotif: fix mod code handling for mod codes more than one character long: the set() function if given a string will give a set of the characters in the string whereas we want in this case a set of string, e.g. "17802" for pseudouridine
v0.2.0
v0.2.0 is a major overhaul compared to v0.1.0. It supports the same core pileup and single read extraction operations as the original dimelo v0.1.0 package, but focuses on a number of new objectives:
- Support multicolor data / any base modification context (GpC, CpC, etc)
- Vector extraction for all data types
- Enhanced speed and reliability, enabling e.g. whole genome processing
- Maintainability -> using a small number of standard dependencies, outsourcing as much as possible to well-maintained third-party packages (e.g. modkit, pysam, h5py, and a few others)
- Modularity in both architecture and operation
- Ease of use, especially for multiplatform installation
- More powerful plotting e.g. bam files from different basecallers, single read sorting, rapid iteration