Precompute A/B Counts and Biallelic Summary Func #14

lkirk · 2025-09-15T01:31:18Z

Precompute A/B counts for each sample set. We were previously computing them redundantly each for each site pair in our results matrix. The precomputation happens in a function called get_mutation_sample_sets, which takes our list of sets (tsk_bitset_t) for each mutation and intersects the samples with a particular mutation with the sample sets passed in by the user. The result is an expanded list of sets with one set per mutation per sample set. During this operation, we compute the number of samples containing the given allele for each mutation, avoiding the need to perform redundant count operations on the data.

In addition to precomputation, we add a non-normalized version of compute_general_two_site_stat_result for situations where we're computing stats from biallelic loci. We dispatch the computation of the result based on the number of alleles in the two loci we're comparing. If the number of alleles in both loci is 2, then we simply perform an LD computation on the derived alleles for the two loci. As a result, we remove the need to compute a matrix of LD values, then take a weighted sum. This is much more efficient and means that we only run the full multiallelic LD routine on sites that are multiallelic.

Precompute A/B counts for each sample set. We were previously computing them redundantly each for each site pair in our results matrix. The precomputation happens in a function called `get_mutation_sample_sets`, which takes our list of sets (`tsk_bitset_t`) for each mutation and intersects the samples with a particular mutation with the sample sets passed in by the user. The result is an expanded list of sets with one set per mutation per sample set. During this operation, we compute the number of samples containing the given allele for each mutation, avoiding the need to perform redundant count operations on the data. In addition to precomputation, we add a non-normalized version of `compute_general_two_site_stat_result` for situations where we're computing stats from biallelic loci. We dispatch the computation of the result based on the number of alleles in the two loci we're comparing. If the number of alleles in both loci is 2, then we simply perform an LD computation on the derived alleles for the two loci. As a result, we remove the need to compute a matrix of LD values, then take a weighted sum. This is much more efficient and means that we only run the full multiallelic LD routine on sites that are multiallelic.

lkirk force-pushed the two-locus-precompute-counts branch 2 times, most recently from 97db517 to 1b7c523 Compare September 15, 2025 21:11

lkirk force-pushed the two-locus-precompute-counts branch from 1b7c523 to eb56840 Compare September 15, 2025 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precompute A/B Counts and Biallelic Summary Func #14

Precompute A/B Counts and Biallelic Summary Func #14

Uh oh!

lkirk commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Precompute A/B Counts and Biallelic Summary Func #14

Are you sure you want to change the base?

Precompute A/B Counts and Biallelic Summary Func #14

Uh oh!

Conversation

lkirk commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants