Skip to content

dashnowlab/STRchive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STRchive

Short Tandem Repeats (STRs) are a type of genetic variation that are associated with many rare diseases. Information about pathogenic STRs is often out-of-date and scattered across different databases, making it difficult to find and interpret STR variants. STRchive ("ess tee archive") aims to solve this problem by providing a central community resource.

⭐️ View the data at strchive.org ⭐️

If you use STRchive in your research, please cite: Hiatt, L., Weisburd, B., Dolzhenko, E., Rubinetti, V., Avvaru, A.K., VanNoy, G.E., Kurtas, N.E., Rehm, H.L., Quinlan, A. and Dashnow, H.✉, 2025. STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci. Genome medicine doi: https://doi.org/10.1186/s13073-025-01454-4.

STRchive by Harriet Dashnow is licensed under CC BY 4.0

Contributors

  • Harriet Dashnow
  • Laurel Hiatt
  • Akshay Avvaru
  • Vincent Rubinetti
  • Macayla Weiner

Contributing

If you notice an error, omission, or update, feel free to leave a comment or create a pull request.

To make a change to the STRchive data itself, please edit data/STRchive-loci.json

Then run the "linting" script and fix any errors:
python scripts/check-loci.py data/STRchive-loci.json

Development

Run all scripts to update STRchive

From the root directory, run:
snakemake

Or to skip retrieve and manubot stages, which will speed things up substantially:
snakemake --config stages="skip-refs"

Making/updating genotyper catalogs

See workflow/Snakefile for example commands

Install dependencies

New install:

conda env create --file scripts/environment.yml
conda activate strchive

Update existing installation:

conda activate strchive
conda env update --file scripts/environment.yml --prune
conda activate strchive

Note: biomaRt isn't playing nicely with conda, so installing it within the R script where it is used.

Using STRchive catalogs

LongTR

A sample command using LongTR to genotype the STRchive catalog in Oxford Nanoport data. The alignment parameters were suggested in gymrek-lab/LongTR#21. The genotyping accuracy has not been assessed.

module load gcc     # or otherwise satisfy this dependency
LongTR \
    --max-tr-len 10000 \    # largest locus in STRChive currently ~4000 bp
    --alignment-params -1.0,-0.458675,-1.0,-0.458675,-0.00005800168,-1,-1 \
    --fasta human_GRCh38_no_alt_analysis_set.fasta \
    --regions STRchive-disease-loci.hg38.longTR.bed \
    --bams sample.bam \
    --tr-vcf sample.longTR.vcf.gz

About

Short Tandem Repeat disease loci resource

Resources

License

Stars

Watchers

Forks

Contributors 9