Skip to content

Feature/update modkit version#22

Closed
OberonDixon wants to merge 25 commits intomainfrom
feature/update-modkit-version
Closed

Feature/update modkit version#22
OberonDixon wants to merge 25 commits intomainfrom
feature/update-modkit-version

Conversation

@OberonDixon
Copy link

No description provided.

Oberon Dixon-Luinenburg and others added 25 commits January 8, 2025 09:18
…l need to apply to export functions, and need to check that performance is unimpacted.
…ause it will complicate parallelization and is not clearly useful.
…cess_pileup_row and renamed/configured a few variables to support this.
…hunk generation, currently only for load_processed.pileup_vectors_from_bedmethyl. pileup_counts_from_bedmethyl is a simpler case. Passes pre-existing tests including when chunk_size<region size, but further infrastructure is required to assess whether regions_5to3prime still works right (logic gets a bit complicated). Speed with a single core and a small number of small regions is slightly slower due to overhead necessary to support parallelization. Need to test that we get a speedup with multi-core long regions or many regions.
…ead_vectors_from_hdf5 test targets to pass tests; other test targets are the same for now.
…d generate_targets.py to properly handle copying over new kwargs from cases.py when re-running some but not all targets: if all targets are re-run, then the old test_matrix.pkl is discarded. If only some are re-run, we need the old test_matrix.pkl, but still need to copy over all the kwargs for all cases that won't be re-run and still need to copy over old results, which will be overridden for re-run targets.
…rocessed-tests-baseline

Feature/dm 191 refactor load processed tests baseline
…ns_5to3prime logic within the pileup_vectors_from_bedmethyl function, which is the only one that actually needs it. Runs faster, still passes all tests.
…y, re-ran targets for pileup_counts_from_bedmethyl and pileup_vectors_from_bedmethyl.
…rocessed-tests-baseline

DM-192 Added chunk_size for load_processes parallelization to cases.p…
…. Added progress bars and quiet toggles for pileup loaders. Set default chunk size to 1 MB after whole-chromosome speed testing. Added option to parallelize within rather than between regions with regions_to_list.
…e to avoid intermittent crashes. Adjust tabix fetch to avoid negative start values. Tweak wording on progress bars.
… to all pileup plotters. A few other small tweaks.
… and Macos ostensibly have 4 cores. This should not change any outputs because for extract the test is hardcoded to use 1 core, and that is the only one with any stochasticity in parallelization.
…to avoid ever sending the cores arg twice, and added a comment to explain the reason why extract is tested unparallelized, unlike other functions.
…ment for extract. Updated test_matrix.pickle by re-running generate_targets. No changes to pileup or extract files, but should pass all tests.
…rocessed-tests-baseline

New test_matrix and cases to cover 1, 2, 3, 4, and None for cores
…r versions but must meet minimum requirements.
…f 0.4.0 and needs to be examined. For now, pin it to a tight range.
@OberonDixon OberonDixon closed this Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant