Skip to content

Pre submission updates#118

Merged
OberonDixon merged 18 commits intomainfrom
pre-submission
Nov 25, 2025
Merged

Pre submission updates#118
OberonDixon merged 18 commits intomainfrom
pre-submission

Conversation

@OberonDixon
Copy link
Collaborator

@OberonDixon OberonDixon commented Nov 19, 2025

Miscellaneous updates in preparation for journal submission. Existing pytest tests all currently pass without changing any pass/fail criteria.

environment.yml: change environment name to dimelo-toolkit

load_processed.read_vectors_from_hdf5: updates to support new single read browser plots that better show data

  • Added span_full_window option. If true, only reads that start before the region_start and end before the region_end are returned, i.e. they must be at least as long as the region. If false, maintain old behavior: all reads that end after the region_start and start before region_end are returned.
  • Added read_length field for sorting
  • Added sorting in asc vs desc order for each of the sequential sorting operations

parse_bam::read_by_base_txt_to_hdf5: updates to support single reads with long gaps, e.g. RNA with splicing

  • When finding reads in the .txt file, instead of calculating read end position once when the read is first encountered (based on current position in genome + current position in read + read length), the read end position is re-calculated each time a new base is encountered for that read. Because the txt is ordered ascending along the genome, the last base encountered for a read will be the one closest to the end of that read. With e.g. megalodon bam files this always gives identical results to the old read end calculation method.

The new method still misses any read gaps that show up after the last potential modification site: this information will only be available with the modkit upgrade described here: nanoporetech/modkit#270

plot_enrichment_profile::plot_enrichment_profile: update to support profile plots with absolute rather than relative coordinates

  • plot_enrichment_profile methods can take in relative flag; True means x-axis is centered around region centers, False means x-axis is absolute genome positions
  • make_enrichment_profile_plot can take offset_center, which gives a position offset to apply to the plot x-axis (e.g., when plotting absolute genome positions)

plot_read_browser::plot_read_browser: update to allow new plot customizations and make it work a bit better with gapped read e.g. RNA

  • markers/lines/color palette settings
  • pass down the span_full_window parameter to the loader
  • read_extent_df now keeps the longest version of each read name, because until we have the new modkit version there are different entries for each mod type for each read, and those each have lengths calculated by the method above which are not guaranteed to be the same between mods

utils::DEFAULT_COLORS: add new ones

utils::ParsedMotif: fix mod code handling for mod codes more than one character long: the set() function if given a string will give a set of the characters in the string whereas we want in this case a set of string, e.g. "17802" for pseudouridine

Copy link
Collaborator

@thekugelmeister thekugelmeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good across the board! Some small comments.

@OberonDixon OberonDixon merged commit 97f7a5d into main Nov 25, 2025
3 of 4 checks passed
@OberonDixon OberonDixon deleted the pre-submission branch November 26, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants