Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b2b7689
DM-229 Consolidating parsing logic for load_processed functions. Stil…
Jan 8, 2025
3f3ec85
DM-231 Removed regions=None case for pileup_counts_from_bedmethyl bec…
Jan 8, 2025
91fb5e4
DM-229 Adjusted export.pileup_to_bigwig to use new load_processed.pro…
Jan 8, 2025
53d039b
DM-192 Implemented parallelization core including shared memory and c…
Jan 9, 2025
caf9dcd
DM-232 Added strand information to regions .bed files. Re-generated r…
Jan 9, 2025
64a9f65
DM-232 Added single_strand and regions_5to3prime to cases.py. Adjuste…
Jan 9, 2025
bb526bf
Merge pull request #16 from streetslab/feature/DM-191-refactor-load-p…
OberonDixon Jan 9, 2025
caa3e59
DM-192 Fixed logic to properly handle single_strand case within proce…
Jan 9, 2025
ac3657e
DM-192 Further refactor to simplify process_pileup_row and keep regio…
Jan 9, 2025
a9648df
DM-192 Added chunk_size for load_processes parallelization to cases.p…
Jan 9, 2025
c830abf
Merge pull request #17 from streetslab/feature/DM-191-refactor-load-p…
OberonDixon Jan 9, 2025
7301d3d
DM-192,DM-236,DM-233,DM-235 Parallelized pileup_counts_from_bedmethyl…
Jan 9, 2025
56e6333
DM-192,DM-236 Properly close and delink shared memory when appropriat…
Jan 10, 2025
97efec1
DM-191 Re-arrange contents of load_processed for better clarity. Adde…
Jan 10, 2025
a8ed1ae
DM-238 Add quiet and cores parameters and corresponding documentation…
Jan 17, 2025
f1b0758
DM-239 Added test case coverage for 1-4 cores. GitHub jobs for Ubuntu…
Jan 17, 2025
0b68892
DM-239 Added cases.py support for cores=None. Tweaked dimelo_test.py …
Jan 17, 2025
1da3d3c
DM-239 Adjusted generate_targets module to properly handle cores argu…
Jan 18, 2025
fcf116f
Ruff format fixes
Jan 18, 2025
5dad63b
Merge pull request #18 from streetslab/feature/DM-191-refactor-load-p…
OberonDixon Jan 27, 2025
f0f027b
Merge branch 'main' into feature/DM-191-refactor-load-processed
OberonDixon Feb 21, 2025
bd94e78
Fixed small mistakes from merge
Feb 21, 2025
0a9feb0
Merge branch 'main' into feature/DM-191-refactor-load-processed
OberonDixon Mar 13, 2025
4454f67
Added check for chrom in tabixfile.
Mar 18, 2025
bfdbff6
DM-191 Adjustments in response to PR comments from thekugelmeister
Mar 23, 2025
d3cc71c
Merge branch 'feature/DM-191-refactor-load-processed' of https://gith…
Mar 23, 2025
7f50782
Adjust naming for regions_to_list region-splitting parallelization pa…
May 30, 2025
3ce0e8f
attempting to fix ruff versioning
thekugelmeister May 31, 2025
2972c34
ruff version fix attempt 2
thekugelmeister May 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,14 @@ jobs:
steps:
- uses: actions/checkout@v4
- uses: chartboost/ruff-action@v1
with:
version: 0.6.8
# TODO: Is it really necessary for these to be separate jobs? This seems redundant.
ruff-format-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: chartboost/ruff-action@v1
with:
version: 0.6.8
args: 'format --check'
73 changes: 25 additions & 48 deletions dimelo/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pysam
from tqdm.auto import tqdm

from . import utils
from . import load_processed, utils

"""
This module contains code to export indexed and compressed parse output files to other formats that may be helpful for downstream analysis.
Expand Down Expand Up @@ -134,54 +134,31 @@ def pileup_to_bigwig(
total=lines_by_contig[contig],
leave=False,
):
# TODO: This code is copied from load_processed.pileup_counts_from_bedmethyl and should probably be consolidated at some point
tabix_fields = row.split("\t")
pileup_basemod = tabix_fields[3]
pileup_strand = tabix_fields[5]
keep_basemod = False
if (strand != ".") and (pileup_strand != strand):
# This entry is on the wrong strand - skip it
continue
elif len(pileup_basemod.split(",")) == 3:
pileup_modname, pileup_motif, pileup_mod_coord = (
pileup_basemod.split(",")
keep_basemod, genomic_coord, modified_in_row, valid_in_row = (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much appreciate the simplification!

load_processed.process_pileup_row(
row=row,
parsed_motif=parsed_motif,
region_strand=strand,
single_strand=(strand != "."),
)
if (
pileup_motif == parsed_motif.motif_seq
and int(pileup_mod_coord) == parsed_motif.modified_pos
and pileup_modname in parsed_motif.mod_codes
):
keep_basemod = True
elif len(pileup_basemod.split(",")) == 1:
if pileup_basemod in parsed_motif.mod_codes:
keep_basemod = True
else:
raise ValueError(
f"Unexpected format in bedmethyl file: {row} contains {pileup_basemod} which cannot be parsed."
)
# TODO: consolidate the above into a function; just do adding outside
if keep_basemod:
pileup_info = tabix_fields[9].split(" ")
valid_base_counts = int(pileup_info[0])
modified_base_counts = int(pileup_info[2])
if valid_base_counts > 0:
genomic_coord = int(tabix_fields[1])
contig_list.append(contig)
start_list.append(genomic_coord)
end_list.append(genomic_coord + 1)
values_list.append(modified_base_counts / valid_base_counts)

if len(values_list) > chunk_size:
bw.addEntries(
contig_list, # Contig names
start_list, # Start positions
ends=end_list, # End positions
values=values_list, # Corresponding values
)
contig_list = []
start_list = []
end_list = []
values_list = []
)
if keep_basemod and valid_in_row > 0:
contig_list.append(contig)
start_list.append(genomic_coord)
end_list.append(genomic_coord + 1)
values_list.append(modified_in_row / valid_in_row)

if len(values_list) > chunk_size:
bw.addEntries(
contig_list, # Contig names
start_list, # Start positions
ends=end_list, # End positions
values=values_list, # Corresponding values
)
contig_list = []
start_list = []
end_list = []
values_list = []
bw.addEntries(
contig_list, # Contig names
start_list, # Start positions
Expand Down
Loading