Feature/dm 215,dm 113,dm 90 genome browser exports, test phase 2, read depth qc by OberonDixon · Pull Request #15 · streetslab/dimelo_v2

OberonDixon · 2025-01-08T02:40:22Z

Added bigwig export in export module.

Updated test framework to include export and plot_read_browser, as well as a new test case. Fixed some error reporting and some code that was generating warnings.

Revamped test target generator code to be more modular and maintainable.

Created read depth profile and read depth histogram plotting modules, with corresponding supporting functions and test coverage.

…sed functions to cover future plans to collapse redundant code

…t bigwig code. Added checkpoints to gitignore.

…cases. Added error handling for utils.parse_region_string as this is called directly by plot_read_browser rather than routing through utils.add_region_to_dict. This may want to be consolidated later by re-ordering the logic in add_region_to_dict, tbd.

…g. Adjustments to test routine and error throwing for plot_read_browser. Re-generated all test targets, after verifying that dimelo passes both old and new targets.

…st target values should be unchanged.

…management. Confirmed that generation and test run successfully on macos.

… works

…hment_profile.

…functions in load_processed (load region-by-region) and utils (hist plotter).

… read depth histogram to __init__.

…processed.regions_to_list, and thus a speedup for plot_depth_histogram functions. Added a useful little cores_available utility for use by other functions.

…ed core functions.

…ns_to_list.

thekugelmeister

I stopped this review halfway through after spending an hour on the read depth plotting. It appears to me that there are 2 or 3 different things being combined into one PR, and I started losing the thread of what has been done here. In general, PRs should be constructed in more compartmentalized ways.

That being said, I covered a lot in the plotting sections, which is a start, at least.

Can you add comments to the PR describing the changes you made to testing, so that I have some context for what to review when I return to it?

thekugelmeister · 2025-01-12T00:06:46Z

dimelo/export.py

+        # note: the tqdm progress bar slows things down by about 33%, which was deemed better at the time of writing this than
+        # 90 seconds without any status updates
+        rows_count, last_row = list(
+            tail(


I understand part of this comment, but am concerned or confused about others.

hard agree that progress bars are good

tqdm is very performant. Usually, if adding a progress bar is making things slow, that means that something else is going wrong in the calculation of progress

fetching the last item in an iterator is supposed to be slow

unless I misunderstand something, constructing a deque just to get the last item seems wasteful?

TL;DR: naively, it seems like the call to tail should be what makes it slow? But I might need to put more thought into this.

dimelo/export.py

thekugelmeister · 2025-01-12T00:14:37Z

dimelo/export.py

+    output_file_path = (
+        bigwig_file
+        if bigwig_file is not None
+        else Path(bedmethyl_file).parent / "pileup.fractions.bigwig"


Path(bedmethyl_file) is constructed multiple times throughout this method; should be declared once and referenced later.

thekugelmeister · 2025-01-12T00:15:37Z

dimelo/export.py

+    )
+    os.makedirs(output_file_path.parent, exist_ok=True)
+
+    # Because we need to set up the bigwig header for we start writing data to it, we need to pre-index the length of each contig


Typo: "before"

dimelo/export.py

thekugelmeister · 2025-01-12T00:47:20Z

dimelo/plot_depth_histogram.py

+        window_size: half-size of the desired window to plot; how far the window stretches on either side of the center point
+        single_strand: True means we only grab counts from reads from the same strand as
+            the region of interest, False means we always grab both strands within the regions
+        average_within_region: if True, each region will only report a single depth value, averaging across all non-zero depths


What happens if it's false? It has to perform some sort of aggregation for each region, right?

thekugelmeister · 2025-01-12T00:52:27Z

dimelo/plot_depth_histogram.py

+    **kwargs,
+) -> Axes:
+    """
+    Plot depth histograms, overlaying the resulting traces on top of each other.


I think there's nuance in this method that could require some more explanation.

If I understand this method correctly, I propose this expansion of the docstring introduction:

Plot a histogram of the read depths observed for each provided region.

If this is correct, I'm not sure I understand the options regarding averaging, and what value is actually reported for each region. I think that needs to be documented more clearly as well.

Also might need some clarification about the single-stranded-ness, and the difference between true read depth and what is reported here.

If this is all explained somewhere else already, a reference for where to look for details would suffice.

thekugelmeister · 2025-01-12T00:54:40Z

dimelo/plot_depth_histogram.py

+    axes = utils.hist_plot(
+        value_vectors=depth_vectors,
+        value_names=sample_names,
+        x_label="per strand read\ndepth in region"


I know this is pythonic and sensible enough, but I found it weird to look at now that ruff has reformatted it. Can the x_label be assigned above so that it's easier to read?

thekugelmeister · 2025-01-12T00:55:21Z

dimelo/plot_depth_profile.py

+    **kwargs,
+) -> Axes:
+    """
+    Plot depth profiles, overlaying the resulting traces on top of each other.


I think a lot of the same commentary from the histogram methods applies to this file.

thekugelmeister · 2025-01-12T00:56:56Z

dimelo/plot_read_browser.py

+            region=region, window_size=None
+        )
+    except ValueError as err:
+        raise ValueError(


Is this just tidying up error reporting, or is this solving a problem you encountered?

Some of the test cases deliberately trip this and so I wanted to handle it in a way where they could check that the error message is right

thekugelmeister · 2025-01-14T00:16:51Z

dimelo/export.py

+    you can specify the regions at parsing time, rather than re-implementing the subset handling logic here.
+
+    Args:
+        bedmethyl_file: Path to the input bedmethyl file


*tabix-indexed gzipped files

thekugelmeister · 2025-01-14T00:33:58Z

dimelo/test/cases.py

+            "title": "megalodon_peaks_190",
+        },
+        # outputs dict function:values
+        {},  # populated in subsequent cells


Change "cells" to "generate_targets.py"

thekugelmeister · 2025-01-14T00:42:29Z

dimelo/plot_depth_histogram.py

+    regions_list: list[str | Path | list[str | Path]],
+    motifs: list[str],
+    sample_names: list[str],
+    window_size: int,


Oberon says you're wrong

dimelo/test/README.md

thekugelmeister · 2025-01-17T22:27:55Z

dimelo/utils.py

+    Takes arbitrarily many counts vectors and plots on same histogram.
+
+    Args:
+        value_vectors: parallel with each entry in vectors


Replace vectors with value_names?

dimelo/utils.py

thekugelmeister · 2025-01-17T22:41:10Z

dimelo/utils.py

+        x=x_label,
+        hue=y_label,
+        multiple="dodge",
+        **{**kwargs, **({"bins": bins} if bins is not None else {})},


I understand what this is doing, but looking at it makes me cringe 😅 python is dumb

I think it's clearer if the bins entry in the dictionary is set in an if statement earlier. Assuming we're keeping the integer_values argument and the defaults, etc., here's how I might organize it for clarity:

if integer_values: # Warn user that passed bins are being overwritten if "bins" in kwargs: print("Warning: bin settings overwritten by defaults") kwargs["bins"] = np.arange(data_table[x_label].min() - 0.5, data_table[x_label].max() + 1.5, 1) # plot histogram ax = sns.histplot( data=data_table, x=x_label, hue=y_label, multiple="dodge", **kwargs, )

Note that I'm also warning the user that we might be overwriting their manual settings; that's kind of unrelated but also came up while drafting this code snippet.

thekugelmeister · 2025-01-17T22:43:31Z

dimelo/test/cases.py

+test_data_dir = Path("./data")
+output_dir = test_data_dir / "test_targets"
+
+region = "chr1:114357437-114359753"  # 'chr1:9167177-9169177'


Can you document where this region came from, even if it's totally random or a historical anecdote? Also, what is the commented out region?

thekugelmeister · 2025-01-17T22:43:57Z

dimelo/test/cases.py

+
+# Paths to input files
+ctcf_bam_file = test_data_dir / "ctcf_demo.sorted.bam"
+# ctcf_guppy_bam_file = test_data_dir / 'winnowmap_guppy_merge_subset.updated.bam'


Is this necessary? If so, can you document why it's here?

thekugelmeister · 2025-01-17T22:45:32Z

dimelo/test/dimelo_test.py

    outputs section of dimelo/test/generate_test_targets.ipynb.
    """

+    def test_unit__regions_list_list(


I can't tell what this is testing from the method name. Can it be clarified, either in name or documentation?

Also, does this method test for values being correct? On first glance it does not appear to. Based on my rough understanding and the docstring for the class, I would have assumed it would do so.

thekugelmeister · 2025-01-17T22:48:29Z

dimelo/test/dimelo_test.py

+    Tests file export functionality in export module.
+
+    This test currently simply checks that we can make the appropriate output files without raising errors.
+    The values stored in the files are not verified.


Is there a reason for this other than not really needing to right now?

dimelo/test/dimelo_test.py

dimelo/test/generate_targets.py

…sts, added to tutorial.

Oberon Dixon-Luinenburg and others added 19 commits January 5, 2025 14:21

DM-216 Added basic bigiwig export, and adjusted TODOs for load_proces…

ff1bdcf

…sed functions to cover future plans to collapse redundant code

DM-217 Added tests for export. Fixed some small problems in the expor…

cce26b0

…t bigwig code. Added checkpoints to gitignore.

Removed checkpoint files

de004ae

DM-218 Added a new test case to cover plot_read_browser error throwin…

23c90c4

…g. Adjustments to test routine and error throwing for plot_read_browser. Re-generated all test targets, after verifying that dimelo passes both old and new targets.

DM-226 Pileup counts window_size logic for loading and plotting.

8430821

DM-221 Adjust test targets pickle with new counts values given windowing

bd6a5bf

DM-220 Updates to test target generator structure and test README. Te…

a704977

…st target values should be unchanged.

Adjusted generate_targets to run from any directory by tweaking path …

bed5caa

…management. Confirmed that generation and test run successfully on macos.

DM-227 Bigwig export cell in Advanced Use Cases section of tutorial

c52acc5

ruff format adjustments

6a2450b

DM-93 Depth profile plots, including tutorial example to show that it…

91dcba4

… works

DM-230 Test coverage for plot_depth_profile; same tests as plot_enric…

97bab8a

…hment_profile.

DM-222,DM-225 Added coverage histogram module with corresponding new …

502be34

…functions in load_processed (load region-by-region) and utils (hist plotter).

DM-222,DM-225 Added progress bars for region-by-region loading. Added…

7358ad4

… read depth histogram to __init__.

DM-222,DM-225 Parallelization for a very substantial speedup in load_…

3d4e128

…processed.regions_to_list, and thus a speedup for plot_depth_histogram functions. Added a useful little cores_available utility for use by other functions.

DM-225,DM-192 Stubs to support future parallelization of load_process…

42935f9

…ed core functions.

DM-230 Adding tests for plot_depth_histogram and load_processed.regio…

1e39d74

…ns_to_list.

DM-93 Enable non-windowed profiles

3aced55

thekugelmeister requested changes Jan 12, 2025

View reviewed changes

thekugelmeister reviewed Jan 14, 2025

View reviewed changes

OberonDixon changed the title ~~Feature/dm 90 read depth qc~~ Feature/dm 215,dm 113,dm 90 genome browser exports, test phase 2, read depth qc Jan 14, 2025

thekugelmeister requested changes Jan 17, 2025

View reviewed changes

Oberon Dixon-Luinenburg added 5 commits January 24, 2025 15:29

Added ref_genome fasta option for export contig definition. Passes te…

5c3a3a3

…sts, added to tutorial.

Adjusted Path conversion for filepaths and added some comments.

bc41313

Fixed typo

cdf9a56

Improved docstrings and streamlined code for regions_to_list

864d61d

Documented more stuff for depth histograms.

802f160

Oberon Dixon-Luinenburg and others added 7 commits January 24, 2025 16:22

Added some words

40c7d28

A bunch of little documentation fixes for test infra

9126acc

Further depth tweaks per PR comments

a6a36f9

Update utils.py

c19c5ea

Move path arg sanitation to utils

1c2a85b

Refactored path sanitization

da30804

Fixed plotly error preventing testing

e441564

thekugelmeister merged commit 821a2c2 into main Jan 29, 2025
3 of 4 checks passed

thekugelmeister deleted the feature/DM-90-read-depth-qc branch January 29, 2025 07:19

Conversation

OberonDixon commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thekugelmeister left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OberonDixon commented Jan 8, 2025 •

edited

Loading