Skip to content

Conversation

@bruno-ariano
Copy link
Collaborator

@bruno-ariano bruno-ariano commented Sep 3, 2025

This branch created the option of using Locusbreaker from TileDB. In order to do so I had to add a new module trying to leave as much as possible the following finemapping and the other munging_locusbreaker option.

Note that TileDB data was already munged and QCed.

I compared the finemapping and coloc results when running Locusbreaker from TileDB and from the original Flanders and the results are exactly the same

@bruno-ariano bruno-ariano marked this pull request as ready for review September 3, 2025 12:29
@bruno-ariano bruno-ariano linked an issue Sep 5, 2025 that may be closed by this pull request
@ariannalandini
Copy link
Collaborator

Thank you very much @bruno-ariano!!! I think it makes sense to keep the "summary statistics" and "TileDB + summary statistics" versions separated for the time being, while we test the TileDB version more extensively.

Would it be possible to avoid publishing the gwas_and_loci_tables/dummy_index file?

ariannalandini and others added 13 commits September 11, 2025 11:17
Add credible set expansion, logsum and wrapping up susie reformatting in a function
…nsion

Revert "Add credible set expansion, logsum and wrapping up susie reformatting in a function"
* Removing susie.cs.ht() because it should be taken directly from the flanders R package

* Have customizable post susie QC parameters

* Nevermind - function copied and pasted from gitlab, temporary solution until merge with github flanders r repo

* Back to sourcing function rather than calling it from flanders R package (temporary solution) and fixing parameters

* Forgot to remove susie_qc_cs_lbf_thr parameter from here

* Added susie QC parameters to the nextflow schema

* Rearraning post susie QC, so that loci disappearing becuase of QC are re-finemmaped with L=1. And overall L=1 loci do not go thorugh QC

* Adding locusbreaker

* Adding locus size parameter

* Add report collecting loci that were re-finemapped with L=1 after being wiped out by post susie QC

* Wrapped all code to go from susie output to rds list of dataframes object in a function

* Adding credible set expansion and using function from susie output to rds format

* Remove some parameters from list of those mandatory - if not specified, default is fine

* Do not hardcode L=10, but rather use the assigned variable

* Ok nevermind, reverting previous commit and adding also post susie QC parameters to required list

* Removing hardcoding of L=10 for easier maintenance

* Forgot to close parenthesis

* Temporarely copy and paste functions from gitlab R package version

* Computing also cs logsum - and adding it to the anndata obs

* Fixed unmatching parenthesis

* tile_lb_input parameters defined but not used. Assigning correct filename to tiledb_bfile parameter in test

* Removing tuple since it's only one element

* Revert "Merge branch 'tiledb_locusbreaker' into cs_expansion"

This reverts commit fc15e3b, reversing
changes made to 2059698.

---------

Co-authored-by: arianna.landini <arianna.landini@external.fht.org>
Co-authored-by: bruno-ariano <bruno.ariano.87@gmail.com>
* Replacing quit with next if no susie LD matrix is empty - avoid breaking the fine-mapping loop

* Checking for KL length in loci with less than L SNPs
…to annData (#108)

* Replacing functions - now going from susie output directly to anndata rather than to .rds

* No longer saving .rds files but anndata (one per fine-mapping job)

* Renaming logsum_lABF to logsum.logABF

* Opt argument is batch, not batch_index

* updating metadata tiledb

* Replacing with grch38 and adding path, replace txt to csv extension in metadata

* Adding input batching logic to tiledb metadata

* Replacing txt with csv metdata extension, adding pgen version of grch38 ld

* Adding batch size in tiledb test, using grch37 pgen version for ld

* Adding resources for multi-cpus process

* Use params rather than hardcoded batch size. Remove all views

* Add batch index to input, set optional output, add batch-name argument

* Update to TileDB fragment and metadata - gwas specific pvalues thresholds

* Pval thresholds now taken from metadata

* Removing view of channels

* Not removiong study_id and phenotype_id columns - needed later for coloc - and adding chr to chromosome number

* Adapting to updated column names in locus brekaer

* Making concat_anndata more flexible - can be used to concatenate also output finemapping anndata, removing no longer needed rds output

* Specifying running coloc in tiledb test

* Removing no longer needed .rds output, adding reticulate to have anndata working

* Still keep coloc master table output

* Taking pheno variance calculated from TileDB

---------

Co-authored-by: arianna.landini <arianna.landini@external.fht.org>
Co-authored-by: bruno.ariano <bruno.ariano.87@gmail.com>
…ta to QCed finemapping only after checking the object isn't null. Now computing conditional statistics inside from_susie_to_anndata()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add TileDB locusbreaker option

2 participants